Clinical Architecture Blog

understanding ICD-10-CM - Part III - A Terminology by the Book

When you are trying to implement a terminology in a software application, the intent of the terminology can have a significant impact on how easy or difficult that process will be.  A terminology that was designed for software tends to elegantly support the patterns of use.  This type of terminology, if you are a data nerd like I am, is a delight to work with and makes the life of the integrator much easier.

Unfortunately, ICD-10-CM is not that kind of terminology.  ICD-10-CM is clunky, unstable and requires significant manual intervention to leverage the information in the format provided by CMS.  if you are wondering why this would be the case, you just have to remember that ICD-10-CM is first and foremost… a book. 

To understand a terminology, you need to know its history, purpose and drivers.  When you examine the ICD-10-CM terminology, it is easy to see that it has been, and still is, meant to be a book.  Not the kind of book you want to curl up with and read in a Naugahyde chair by the fireplace.    It is the book you use to find codes and perhaps kill a large spider with when you are not wearing shoes.

You might be asking “Charlie, how can you say it is a book?  What evidence do you have to support this outlandish claim?”  I will back up my claim with irrefutable evidence.

The Evidence

Exhibit A - Loosely coupled and unstable hierarchy

The codes are organized first into chapters, twenty-one to be exact. 

These chapters divide the codes into groupings based on a combination of etiology, body system and code purpose.  These chapters do not have stale identifiers, but are typically correlated to numbers based on the chapter sequence.

Each chapter is divided into sections.

These sections logically group the rubrics (three digit codes) into a code range.  The sections do not have stable identifiers but are typically associated with the code ranges.  For example, in chapter 11, “Diseases of the digestive system,” there is a section called ‘Diseases of appendix’ that covers rubrics from K35 to K38.

In a true terminology, the organizational hierarchy would have stable identifiers that are unambiguously linked to the rubrics.  In this case, the chapters and sections would be classes and subclasses used to organize and navigate the rubrics they contain.

Chapters and sections are artifacts typically found in a…what do you call it?  Oh yeah… a book.

Exhibit B - Alpha Indexes instead of synonyms

ICD-10-CM has three “Alpha Indexes”.  The Alphabetic Index consists of the following parts: the Index of Diseases and Injury, the Index of External Causes of Injury, the Table of Neoplasms and the Table of Drugs and Chemicals.

The idea is that the user goes to the appropriate alpha index and finds the words they are looking for and that listing either directs them to the appropriate code OR redirects them to another word in the alpha index.

For example: 

Let’s say you were looking for the code for “Abdominalgia”…

You would go to the “Index of Diseases and Injury” and run your finger down the list until you find it.  Next to it you would see the following:

See Pain, Abdominal

You would then flip the index to the P’s and run your finger down the list to “Pain, Abdominal” and you would see 16 codes and in that list you find the one you want “Pain, abdominal, rebound” and next to it you see the following:

See Tenderness, abdominal, rebound

So we flip the index again to the T’s and find what we are looking for “Tenderness, abdominal, rebound” and next to it you see eight codes and you pick the basic one: R10.829

The external cause index works in a similar fashion to this.

The drug and neoplasm tables are used to establish lookup grids for pre-coordinated scenarios.
In the Table of Neoplasms, the alpha index items are anatomic locations for each row and each column represents the nature of the neoplasm (Benign, In Situ, Uncertain, Unspecified, Malignant Primary, Malignant Secondary) with additional “manifestation” codes and “see also” codes listed when appropriate.

In the table of Drugs, the alpha index items are the drug formulations for each row and each column represents a drug-related event (Adverse Effect, Underdosing, Intentional Poisoning, Accidental Poisoning, Assault Poisoning and undetermined poisoning) with “see also” codes listed when appropriate.
Alphabetic indexes are something you find in the back of (wait for it) … a book.

Exhibit C - Important relationships and associations conveyed as unstructured text

ICD-10-CM codes have relationships to other codes.  They might be designated as ‘Code Also’, ‘Code Additional’, ‘Code First’, ‘Excludes1 (does not include)’, ‘Excludes2 (Should not be coded with)’ and ‘Includes’ (which is actually more synonyms).  How are these relationships provided?  As wild carded text in the chapter, section or code header associated with the codes in the book.

Here is an example of this from the XML provided by CMS:

E09 - Drug or chemical induced diabetes mellitus

Code First

poisoning due to drug or toxin, if applicable (T36-T65 with fifth or sixth character 1-4 or 6)

Use additional

code for adverse effect, if applicable, to identify drug (T36-T50 with fifth or sixth character 5) code to identify any insulin use (Z79.4)


diabetes mellitus due to underlying condition (E08.-)
gestational diabetes (O24.4-)
neonatal diabetes mellitus (P70.2)
postpancreatectomy diabetes mellitus (E13.-)
postprocedural diabetes mellitus (E13.-)
secondary diabetes mellitus NEC (E13.-)
type 1 diabetes mellitus (E10.-)
type 2 diabetes mellitus (E11.-)

As humans reading a book, we can see the above information and sort out the details.  If ICD-10-CM were meant to be a terminology, this ontological information would be conveyed via concrete relationships between terms.  This would make it easier for the software to assist the human in understanding how the codes should be used together to fulfill their objective.

Terminologies provide relationships as defined links between terms (a structure typically referred to as triples (code->relationship type->code). This makes it easier for the software to consume so that it can provide concrete assistance.

Exhibit D – Um… it’s a BOOK!
Ladies and gentlemen – photographic evidence that the defendant is in fact a book.   This information format being its primary delivery system be it in electronic or hard back form.  The prosecution rests.
(Sorry – watched a Matlock marathon last weekend and got a little carried away…)

What does it all mean?toserveman.jpg

As I said at the beginning of this post, to understand a terminology, its capabilities and limitations, it is always best to understand its origins, code structure and intent.  In the case if ICD-10-CM, the fact that it is designed to be first and foremost a book just means that we need to lower our expectations of its capabilities as a terminology. 

In the case of ICD-10-CM, what this means is that in its “natural” form (for example, if you get the XML from CMS like we do), the content has the following limitations:

1. It is challenging to search for a term – in its natural state, all synonymy is trapped in the indexes.  The indexes are not structured in a way that makes them easy to implement.  This makes ICD-10-CM difficult to integrate into an application unless you do some work to build or license an interface terminology.

2. It is challenging to browse the hierarchy – while the rubric-based hierarchy is logical, it is not provided as an actual hierarchy with relationships.  Also, since the chapters and sections lack stable identifiers, it is up to the implementer to stabilize them and synthesize the implied hierarchy for the chapter, section and rubric.

3. It is challenging to traverse relationships – because the relationships like “code-first,” “code also,”  “excludes,” etc. are not managed as triples, but rather as ranged free text expressions.  There is no practical way to navigate them without human or advanced algorithmic interpretation.

In healthcare, we are still in transition, still evolving away from a model in which humans in back rooms are mapping and coding.  ICD-10-CM is a remnant of that old model.  In the domain of “modern” informatics, the latest and greatest terminology we are being mandated to use is an anachronism, the equivalent to a View-Master in a world full of Virtual Reality headsets.

What can we do about it?

We can finds ways to cope with it.  ICD-10-CM is moving ahead and, despite the issues it represents, I think it is the right decision.  Mostly because as an industry, we have put too much time and money into making it happen. If we don’t do it now, how much credibility will we be able to salvage?  At this point, it is a moral imperative.

The Silver Lining

Sometimes you have to get to a tipping point, a point where you say to yourself, “Wow, that was not a good idea.”  This tipping point forces you to reconsider the direction, perhaps even to question everything.   It is how we gain experience.  I believe that ICD-10-CM is that tipping point for pre-coordinated terminologies in healthcare that could help us to change how we think in the future.

In the meantime, at Clinical Architecture, we have created a version of ICD-10-CM in which we have transformed what we found in the ICD-10-CM XML from CMS into an actual stable ontology.  We provide this at no additional charge to our Symedical clients (along with over 400 other content assets).  We will also be making this available as a flat file data offering in our Content Cloud, which will be available at the end of July.  Feel free to contact us if you would like to learn more.

I am going to take a break from ICD-10-CM for a while.  If you have any question or additional thoughts, please leave them in the comments and I will be happy to share and address them.

Posted: 6/23/2015 11:11:46 AM by Charlie Harp | with 0 comments

Spring 2015 Update

This is a brief intermission from your regularly scheduled blog entries…

I wanted to pop in and share with you an update from our offices in Carmel, Indiana. This is a slight departure from our usual blog content, but we wanted let you know what’s been going on at Clinical Architecture so far this year.


Last month we made the quick drive up to Chicago for HIMSS15 and had a great week giving demos, discussing business, seeing old friends and meeting new people. One of the things we did different this year was to replace our usual gadget raffle with a donation sweepstakes. Each day we had a drawing and the winner received the opportunity to designate a charity to receive a $500 donation. The response to this was amazing and I was the lucky person who got to notify the winners. Not only were they excited they won, they were all truly passionate about giving back and thankful for the opportunity to select a charity to receive the donation. Here are the charities the winners selected:

Our booth at HIMSS15:


New Location

In March, we opened an office in Salt Lake City, Utah. Our Chief Informatics Officer, Shaun Shakib, is holding down the fort at this new location and enjoying a beautiful view of the mountains.

View from the Salt Lake City Office:


New Faces

We are welcoming several new faces to our team this spring. Stephanie Broderick has joining Clinical Architecture as Vice President of Product Management. Stephanie brings over 25 years of product management and software development leadership to Clinical Architecture, with 20 of those years spent creating healthcare IT solutions for First Databank and Medi-Span. We are also adding to our Customer Support and QA groups with several recent graduates from Indiana University and Xavier University. We are always looking for great talent. Let us know if you or someone you know is interested in working with us.

Cool New Products

We are excited to announce we are launching a new product this summer. Keep your eye out for updates about our stand alone content delivery system, which will simplify obtaining and updating the terminology assets you need.

Come and see us!

We are looking forward to HIMSS16 in Las Vegas next year! But before then, you can visit us at this year’s AHIMA show in New Orleans, Louisiana. We will be at the Annual AHIMA Convention September 27th – 30th at booth 551. Stop by and see us if you are attending!

Charlie will continue your regular scheduled blogging with his series about ICD-10 next in the next few weeks.


Amanda O'Rourke, Director of Marketing
Posted: 5/15/2015 6:21:04 AM by Amanda O'Rourke | with 0 comments

Understanding ICD-10-CM - Part II - What's In A Code?

To understand a terminology it is important to understand its history, coding scheme, content model, the intended use cases and editorial policy.  In my previous post, I reviewed the origin of ICD-10.  In this post I am going to spend some time talking about the ICD coding scheme and, to be honest, rant a little about structured coding schemes in general.

A coding scheme is the method that is used to assign an identifier to the terms in a terminology.  In the history of our industry coding schemes have been important and in some cases iconic.  Not sure what I mean?  I am sure at least some of you reading this post can look at the following list of codes and identify the source of each on sight.  In fact, there are likely some of you that can tell me what the term is for a few of those codes. 

1.  55454-3
2.  250.02
3.  E11.9
4.  55289-211-60
5.  3E013VG
6.  1-800-783-3637

 (Let me know if you get them all – no cheating – answers at the end of the post)
The fact that these codes are recognizable is not an accident.  In some cases it is just an iconic pattern and in others it is all part of the logic used to create them.  This is because all but one of the above codes is something called a meaningful, or structured, code.

As people we do not typically communicate with codes.  Codes are something we created so computers could better cope with the ambiguities of human language and conception.  So the target audience for codes is computers, or more appropriately, software.  The only reason we need to work with codes is so  we can pick the right codes to properly convey information to our silicone-based counterparts.  Historically our ability to interface with software has been, for lack of a better word, rustic. 

In many cases this required some of us (the lucky ones?) to learn the language of codes so we could speak directly to the software.  The problem is, our brains are not typically wired for codes. 

Magic Number Seven

In studies conducted during the 1950s it was shown that the capacity of a typical person’s working memory is seven things.  This led to what has been dubbed the “magic number seven” by psychologists.   

On May 4, 1955 (60 years ago to the day – next Monday), psychologist George A. Miller wrote a fascinating paper on the topic that begins with a great paragraph about the number seven:

“My problem is that I have been persecuted by an integer. For seven years this 
number has followed me around, has intruded in my most private data, and has 
assaulted me from the pages of our most public journals. This number assumes a variety of disguises, being sometimes a little larger and sometimes a little smaller than usual, but never changing so much as to be unrecognizable. The persistence with which this number plagues me is far more than a random accident. There is, to quote a famous senator, a design behind it, some pattern governing its appearances. Either there really is something unusual about the number or else I am suffering from delusions of persecution.”

Dr. Miller has engaging writing style and  the paper itself is an enjoyable read, if a bit steep at times.  It was being written during a the early days of information theory. In the paper he explores our human limitations with respect to retaining and processing unidimensional information.  He also describes coping mechanisms that we employ to expand these limits.

 “[The] span of absolute judgment and the span of immediate memory impose severe limitations on the amount of information that we are able to receive, process, and remember. By organizing the stimulus input simultaneously into several dimensions and successively into a sequence of chunks, we manage to break (or at least stretch) this informational bottleneck.”

And concludes with his thoughts on the number seven:

“And finally, what about the magical number seven? What about the seven wonders of the world, the seven seas, the seven deadly sins, the seven daughters of Atlas in the Pleiades, the seven ages of man, the seven levels of hell, the seven primary colors, the seven notes of the musical scale, and the seven days of the week? What about the seven-point rating scale, the seven categories for absolute judgment, the seven objects in the span of attention, and the seven digits in the span of immediate memory? For the present I propose to withhold judgment. Perhaps there is something deep and profound behind all these sevens, something just calling out for us to discover it. But I suspect that it is only a pernicious, Pythagorean coincidence.”

(If this was a informatics nerd showdown this is where Dr. Miller would drop the microphone...Respect)
This notion of “chunking” or organizing information into patterns and/or applying some logic to the organization of those bits is a brain hack that allows us to expand beyond our natural limitations.
Structured coding schemes take advantage of this hack by chunking meaningful information into short codes so that we can remember blocks of information and when applicable use logic to determine the rest.  This is evident in telephone numbers (3 digit areas code, 3 digit prefix – 4 digit suffix), postal codes (5 digit number), social security number (3 digit code, 2 digit code, 4 digit code) and of course ICD codes.

In fact if you look at the code examples I listed in the “quiz” you can see evidence of this type of mechanism.  That’s right…I hacked your brain.

The ICD Scheme

The coding scheme for ICD family has been a logical hierarchy for a while now.  From ICD-6 through ICD-9 it has been a numeric three digit root code where the first two digits represented a category and the third digit represented a significant axis, like anatomic location.  This base code (referred to as the “rubric”) is followed by a period and then up to two additional digits representing the etiology and sub-classification.  In ICD9 they later added the V and E codes into the first position to support external causes, clinical modifiers. 

The coding scheme in ICD-10 is still a logical hierarchy, but it has adopted an alphanumeric approach which gave it some room to maneuver.  Like its predecessor, it starts with a three byte rubric.  In ICD-10 the rubric represents a category.  In the international ICD-10 the rubric is a letter followed by two digits.  In the 2011 version of ICD-10-CM this was changed to allow the third character to be a digit or a letter, due to the need to expand the rubrics. In the 2014 edition of ICD-10-CM there are 567 codes where the character digit is an ‘A’ and 9 codes where the third character is a B.
The General Structure

In general the rules of the ICD-10-CM scheme are as follows:
  • Consists of three to seven characters
  • First character is alpha (All letters used except 'U' - 'U' are not invited...)
  • Second character is numeric
  • Third character can be alpha or numeric
  • Decimal placed after the first three characters
  • Fourth, fifth, sixth, and seventh digits can be alpha or numeric

Here is a basic picture of a full seven digit ICD-10-CM code:

Not all codes include all of the positions and the visit encounter digit is typically used for injury and external cause-related codes.

The Logical Hierarchy

A simple example of the hierarchical nature of the ICD-10-CM structure for a neoplasm code is illustrated below:
D30 Benign neoplasm of urinary organs
D30.0 Benign neoplasm of kidney
D30.00 Benign neoplasm of unspecified kidney
D30.01 Benign neoplasm of right kidney
D30.02 Benign neoplasm of left kidney
You will notice that:
  • The category of D30 represent the type of neoplasm and the anatomic region (in this case)
  • The anatomic location of kidney is the ‘0’ in the fourth position
  • Unspecified is the ‘0’ in the fifth position
  • Right is the ‘1’ in the fifth position
  • Left is the ‘2’ in the fifth position
Neoplasms and diseases tend to follow this simple pattern.  But the positions of the relative aspects can vary based on the category.  The values used for a given anatomic location or laterality are mostly consistent but that is generally the same as inconsistent so you should not rely on them programmatically.
For an injury code you will notice that the structure is a little more involved:

M80 Osteoporosis with current pathological fracture
M80.0 Age-related osteoporosis with current pathological fracture
M80.00 Age-related osteoporosis with current pathological fracture, unspecified site
M80.00XA Age-related osteoporosis with current pathological fracture, unspecified site,
initial encounter for fracture
M80.00XD Age-related osteoporosis with current pathological fracture, unspecified site,
subsequent encounter for fracture with routine healing
M80.00XG Age-related osteoporosis with current pathological fracture, unspecified site,
subsequent encounter for fracture with delayed healing
M80.00XK Age-related osteoporosis with current pathological fracture, unspecified site,
subsequent encounter for fracture with nonunion
M80.00XP Age-related osteoporosis with current pathological fracture, unspecified site,
subsequent encounter for fracture with malunion
M80.00XS Age-related osteoporosis with current pathological fracture, unspecified site,
The seventh character is sometimes referred to as the extension.  For injuries and external causes the codes that are used for the encounter types are:
  • A - Initial encounter
  • D - Subsequent encounter
  • S – Sequelae (a secondary result)
For fractures specifically addition codes are needed to reflect additional fracture-related details:
  • A - Initial encounter for closed fracture
  • B - Initial encounter for open fracture
  • D - Subsequent encounter for fracture with routine healing
  • G - Subsequent encounter for fracture with delayed healing
  • K - Subsequent encounter for fracture with nonunion
  • P - Subsequent encounter for fracture with malunion
  • S - Sequelae
The extension always goes in the seventh character.  If there is no value for the sixth character, as in the example above, is it filled in with an ‘X’ (like a short cartoon expletive…).

Rule 6! ... There is no… rule 6

In some states is it illegal to borrow your neighbors vacuum cleaner and in others it isn’t.  The logical rules of the ICD-10-CM coding scheme are like that.  In some sections of the terminology (based on a range of initial alpha characters) the rules for the structure shift around and in some cases the rules within a given neighborhood are violated out right.  What this means is that you cannot rely on the "logical" rules.  This leads me to ask,  why even pretend to have rules? and… when can I get my vacuum cleaner back?

Logical Coding Scheme Rant – as promised

Logical hierarchy coding schemes represent an interesting dichotomy.  Like their more expressive cousin, mnemonics, they were born in a time when the population of code systems was limited to a degree that a typical person could use brain hacking to reasonable access them. The problem with using this approach today is that the same structural rigor that allows us to remember a meaningful code imposes limitations on the amount of information that can be represented in that structure. 

For example, the first digit of the ICD-10 rubric has exactly 26 possible values (A-Z) and prior to 2015, each initial alpha can have 100 possible values (00-99). That gave us an upper limit of 2600 rubrics.  Beyond the rubric this also means that each position in the code has a limit of 10-36 values depending on whether you allow number, letters or both.  In the case of ICD-10-CM this limitation created an issue when some of the rubrics ran out of space (like ‘C7-’), which required a decision to be made; should new codes be added into other rubrics where there is space available or break the previously existing rules that said a rubric was a letter followed by two digits?  Obviously, they chose the latter.  The reason for a meaningful code is that it is predictable and we rely on that predictability in our minds and, for many of us, in our software’s logic.  How many systems had a regular expression that expects a letter followed by two digits to recognize or validate an ICD-10-CM rubric?

A logical hierarchy coding scheme also limits how the information can be represented.  Specifically as the structure of the code precludes anything other than a mono-hierarchy.  Unless you want to be in the business of duplicating codes.  This also means that if you want to change the location of a term in the hierarchy you need to change its code.   The National Drug Code (NDC) is a good example of this.  The first two bytes historically represented the labeler or manufacturer.   In the unlikely event that one of these pharma giants purchases another… the meaningful code becomes somewhat less meaningful.

The Donut ConundrumDonut.png

Structured coding schemes are like donuts, I like them, but they ultimately lead to regret.  As an engineer, I am drawn to structured codes like a moth to a flame.  It is what I “grew up” with and it is much easier to remember a structured code than a thirty six byte GUID (global unique identifier).  However, whenever I succumb to that siren song and create something with the limitation of a central number wheel or a logical structure, it almost always results in a hard choice or rework down the road.  Structured coding schemes are the comfort food from our software past that should be removed from the terminology food pyramid.

The Desiderata

In his “desiderata for controlled medical vocabularies”, Dr. James Cimino does an excellent job of explaining why meaningful codes are problematic and suggests instead what he calls a nonsemantic concept identifier.  I would go a step further and suggest that an identifier that is not a GUID also has limitations when it comes to creating and extending terminology… but that is a diatribe that would be better in a separate article.

Invisible Terminology

I have a favorite guiding principle that you could apply to all user experience situations:

Technology is at its very best when it’s invisible. When you’re conscious only of what you’re doing, not the technology you’re doing it with.

We need to evolve to the point where the terminologies we use to drive our applications at the point of care do not limit the end user or require them to memorize codes to be efficient.  We will know we are doing it right when we stop talking about it.   If you wonder if technology can make this happen, answer the following question: How often do you dial an actual telephone number on your smart phone?

In my next post I will go into the overall structure of ICD-10-CM and provide more pragmatic insight for those of us that will be, and are, working with it. 

Thanks for Reading!
Answers to the “quiz”
   Code          Source      Term                                                       
1. 55454-3        LOINC       Hemoglobin A1C
2. 250.02         ICD-9-CM    Diabetes Mellitus without complications
3. E11.9          ICD-10-CM   Type 2 Diabetes Mellitus without Complications
4. 55289-211-60   NDC         GLUCOPHAGE 500 MG TABLET [PD-RX PHARM 60ea F/C]
5. 3E013VG        ICD-10-PCS  Intro of Insulin into SubQ Tissue, Percutaneous Approach
6. 1-800-783-3637 US Phone    Stanley Steemer (1-800-STEEMER) (go ahead... sing the rest)

Posted: 4/29/2015 9:59:16 PM by Charlie Harp | with 1 comments

Clinical Architecture at HIMSS 2015


At Clinical Architecture we are gearing up for the HIMSS conference next week. 

We will be in booth #2074 and would be delighted if you stopped by and said hello.

If you are attending, we know you will be busy and there are a lot of booths to visit.  
Here are the top ten reasons YOU should visit the Clinical Architecture booth.  

Top Ten Reasons to Visit the CA Booth at HIMSS15


Experience Symedical, a modern commercial terminology management platform, in action.


Try to beat Symedical in a map challenge. 
(Can you map 10 before it maps 1000? 10,000?...puny human)



Check out the new SIFT engine and see the future of Clinical Language Processing



Terminology Themed Chocolate



Meet members of the Clinical Architecture team and ask questions… for FREE!



Learn how true software assisted semantic mapping can improve quality and reduce your costs.


Visit with our smart and savvy neighbors in the Apelon booth.



Check out our new Content Cloud and see how simple staying current can be.



See how you can easily manage terminologies across your entire enterprise or client base with our Content Governance Architecture.




Learn show how you can leverage the best commercial semantic platform in your solution with our powerful API’s.


Fabulous Prizes

For my blog readers I am doing a little something special. 

If you register here and visit our booth (2074), you will receive an Amazon Fire TV stick. 
Supplies are limited so be quick about it.   

We will also be raffling off a $500 donation to the charity of one lucky winners choice, each day of the show.

Regardless of the reason, you should stop by booth 2074, meet some nice people, see some cool software and talk about healthcare terminology. 

That’s right… we know how to party.

I hope to see you there!


Posted: 4/6/2015 10:09:37 PM by Charlie Harp | with 1 comments

Understanding ICD-10-CM - Part I - Origin Story

Lately I have taken some interest in ICD-10-CM.  Be advised that I tend to write these blogs for the clinical engineer or architect, not for the informaticist, clinician, or nosologist (though I am happy to have them as readers as well).

To understand any terminology you first need to understand its history; why was it built and how has it evolved or mutated over time.

ICD-10-CM, along with other "international" classification taxonomies are curated by the World Health Organization or WHO (not to be confused with "The Who" … though you can see signs of classification in many of their song titles).

The ICD in ICD-10-CM stands for “international classification of diseases”, the “10” is the revision, and the “CM” stands for “Clinical Modification”.

(It should be noted that there are other ICDs as well, ICD-O-3 is an oncology classification used by cancer registries.  They are not related ontologically in any way.  But that is the subject for another post).

Along Time Ago in a Green House Far Far Away…

The origin of the ICD goes back to the late 1700s.  French physician and botanist, Fran├žois Boissier de Sauvages de Lacroix, published his treatise, "Nosologia Methodica" in 1763.  Sauvages, as he was known, was one of the early nosologists (nosology = the branch of medical science dealing with the classification of diseases) that inspired many who followed in his footsteps.  The drivers for many of the early attempts at establishing these classifications schemes were related to tracking morbidity statistics (take that population health!). Sauvages died at the age of 59, three years after his treatise was published.  In a twist of irony, I was unable to find a record of his cause of death.

Jump forward to 1860.  At the International Statistical Congress held in London, Florence Nightingale made a proposal that ultimately resulted in the first model of systemic collection of hospital data for the purposes of tracking causes of death. 

In 1893, the “Bertillon Classification of Causes of Death” was introduced by Jacques Bertillon, a French physician.  This classification system was based on the principle of distinguishing between general diseases and those localized to a particular organ or anatomical sites.  In 1900, it was adopted by the American Public Health Association (APHA) and was re-branded as the “International Classification of Causes of Death”.  At this time the ICD contained 161 primary codes (some with modifiers).  The code was a number and the modifier was a letter. For example:

146. Burns
A) by fire
B) by corrosive substance

What is interesting is that this common way of presenting a list (that we all use) has over time become the pattern for structuring an identifier.

I have included a few excerpts from the “prefactory” of the pamphlet provided to the assembled reviewers.

“The time is especially suitable for the general adoption of a uniform classification of causes of death, to the end that the mortality data of the coming century may be more thoroughly comparable than at present.”

“The Bertillon classification is not presented as by any means a perfect system of classification of causes of death. No perfect system has ever been devised, and should there be, the progress of medical science would in time render it obsolete.”

When you reflect on it, it is both amazing and terrible to consider that the topics and issues we discuss today are not that different than those discussed over a century ago.  I especially like the last sentence as it shows they understood that terminologies in healthcare are not static and will undoubtedly drift over time … a concept many people fail to grasp in healthcare IT today.

At the conference in 1900 they also decided that they would meet every 10 years to review and revise the ICD.

What we have here is an origin story.  Like every good origin story you examine the event that created our hero (or villain) to determine why they chose the path we find them on today.  For ICD, it started out about death.  “What killed these people?” was the question it was meant to answer and track.  It was also about statistics, not about a patient (they were already dead…).  This makes sense because at that time the available technology (paper) could only be used retrospectively and with much effort.

So, like time, the ICD marches on…

For many cycles minor revisions were made.  In 1946 the 6th revision (ICD-6) was released. This revision expanded the ICD to include morbidity as well as mortality conditions and was renamed accordingly to “International Statistical Classification of Diseases, Injuries and Causes of Death”.  The addition of injuries and diseases also introduced the need to add new modifiers for anatomic location.  This revision also introduced the alphabetic index to accompany the tabular index to aid in finding the appropriate code as the number of terms had increased significantly.   

This is the point at which ICD shifted from a focus on death to a focus on people's problems … and death.  This allowed people that were tracking statistics retrospectively to evaluate not just what killed people but also what was hurting and afflicting people. This was especially useful when assessing the labor force or your military capabilities.

The 7th (1955) and 8th (1965) revisions were limited mostly to corrections and minor updates. 


The 9th revision (ICD-9) was introduced in 1975 and once again ICD was expanded.  This time there was a contingent who wanted to use the ICD to evaluate medical care.  To do this they wanted to revise the taxonomy itself to have an organ system focus and wanted additional detail. This was in part due to the advent of data processing systems (and might be loosely related to the popularity of disco … but I have no evidence to support that).

Up to this point, consumers had been using ICD for statistics and retrospective reporting for decades.  Now, a new group of consumers are introduced who need a code system in order to document what is happening to a patient for tracking and billing.  One use is population reporting and the other is, arguably, clinical.  The two groups disagreed about how they should proceed at the time, based upon the two different use cases.  In the end, they decided to compromise and leverage the ICD for these two distinctly different purposes.  This is the point, dear reader, when our hero became a villain. (Insert ominous music of your choice here).

Why would I say this? I believe when you take a terminology designed to solve a specific problem and try to use it to solve another, different, problem you create an architectural compromise. Once compromised, the appropriateness of that terminology for either use case becomes fuzzy, as each end pulls aspects of the architecture to suit their disparate purposes.  After a while, the terminology is so compromised that it becomes unwieldy and is not suitable for any purpose. (Climbing down from my soapbox…)

It has been almost 40 years since ICD-9 was introduced and we, here in the United States, are still essentially on that revision. The most recent version has about 17k terms.


ICD-10 was introduced in 1990 (yes – a quarter century ago).  This international version of the classification is used in about 100 countries for cause of death reporting and statistics.  Some countries, like Australia and Canada have modified versions.

The current version of the standard international ICD-10 has just shy of 18,000 terms.  It's major difference from ICD-9 is the coding scheme which has shifted from numeric to alpha-numeric. (because remembering codes is like cheating...). 


Today in the United States we are set to adopt an expanded version of ICD-10 called ICD-10-CM, where the CM stands for Clinical Modification.  The "modified" version is significantly different from ICD-10 in that it expands each of the standard ICD-10 codes with additional granularity (like laterality, trimester, encounter type and complications).  It also adds a number of causes, places, activities and other healthcare codes.  The net result is a classification system with over 90,000 codes.  

ICD-10-CM is the version we will explore in the next post in this series.

Here is a graphical timeline showing the classification name, focus and polulation.  I am not a historian so if I missed something or got something wrong, feel free to let me know.


Posted: 3/30/2015 11:50:27 PM by Charlie Harp | with 1 comments