What seems to be the Problem?
International Classification of Diseases (ICD)
International Classification of Diseases is a publication from the World Health Organization (WHO) and it provides a number of vocabularies for expressing disease concepts.
The history of the ICD is available here: http://www.who.int/classifications/icd/en/HistoryOfICD.pdf
ICD-9-CM
The International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) is based on the World Health Organization's Ninth Revision, International Classification of Diseases (ICD-9).
ICD-9-CM is the official system of assigning codes to diagnoses and procedures associated with hospital utilization in the United States.
The structure of ICD-9-CM codes is relatively straight forward. The code itself is an explicit hierarchy with the primary disease characteristic typically represented by the first part of the code and the secondary characteristics grouped in numeric sequence in the second part of the code.

As you can see in the example below you should always treat the ICD-9-CM code as text and not a numeric as numeric interpretation of the code would be a disaster.
Below are the ICD-9-CM codes representing ‘hypertensive chronic kidney disease':
403 Hypertensive chronic kidney disease
403.0 Hypertensive chronic kidney disease, malignant
403.00 Hypertensive chronic kidney disease, malignant, with chronic kidney disease stage I through stage IV, or unspecified
403.01 Hypertensive chronic kidney disease, malignant, with chronic kidney disease stage V or end stage renal disease
403.1 Hypertensive chronic kidney disease, benign
403.10 Hypertensive chronic kidney disease, benign, with chronic kidney disease stage I through stage IV, or unspecified
403.11 Hypertensive chronic kidney disease, benign, with chronic kidney disease stage V or end stage renal disease
403.9 Hypertensive chronic kidney disease, unspecified
403.90 Hypertensive chronic kidney disease, unspecified, with chronic kidney disease stage I through stage IV, or unspecified
403.91 Hypertensive chronic kidney disease, unspecified, with chronic kidney disease stage V or end stage renal disease
Note: This manner of establishing codes is less than ideal. A smart code is a identifier that implies meaning through its structure. Typically this manner of establishing codes becomes fraught with issues as a coding scheme becomes more complex over time. For example, there is not a very good way to express a disease or procedure in ICD-9 if it belongs in more than one place in the hierarchy (poly-hierarchical) without creating duplicate concepts (which is bad).
There are roughly 22,000 ICD-9-CM codes.
The ‘Home Page' of ICD-9-CM is http://www.cdc.gov/nchs/icd/icd9cm.htm
Wikipedia has a fairly robust ICD-9 list and reference capability here: http://en.wikipedia.org/wiki/List_of_ICD-9_codes.
Like all public standards is not provided in a format that makes it easy to use. The downloads that are available via FTP are rich text files that are human readable that are not easy to parse into a typical application consumable vocabulary file.
There are a number of web sites that providing search and lookup tools for ICD-9 but the only source of free coded ICD-9-CM codes (that I have found) is the UMLS metathesaurus (also not easy...).
If you want easy and well structured you need to pay...
I recommend Ingenix at the following link: http://www.shopingenix.com/Category/100093/Product/16699/
ICD-10-CM
Like ICD-9-CM, ICD-10-CM is based on the World Health Organization ICD-10 coding system. ICD-10 is designated to replace ICD-9 and is a more granular terminology (actually more like SNOMED-CT).
The structure of ICD-10-CM is different than ICD-9. The codes are alphanumeric where the initial alpha code delineates the codes into 22 chapters.
Below are the ICD-10-CM codes representing ‘hypertensive chronic kidney disease':
I120 Hypertensive chronic kidney disease with stage V chronic kidney disease or end stage renal disease
I129 Hypertensive chronic kidney disease with stage I through stage IV chronic kidney disease, or unspecified chronic kidney disease
There are roughly 68,000 ICD-10-CM codes.
The structure of the ICD-10 that is as is depicted below (thanks to the AHIMA website).

There is a good primer on the differences between ICD-9 and ICD-10 on the AHIMA website here: http://library.ahima.org/xpedio/groups/public/documents/ahima/bok1_038084.hcsp?dDocName=bok1_038084
The ‘Home Page' of the ICD-10-CM is http://www.cdc.gov/nchs/icd/icd10cm.htm
Wikipedia has a fairly robust ICD-10-CM list and reference capability here: http://en.wikipedia.org/wiki/ICD-10.
RTF was apparently too easy as ICD-10-CM is published as a PDF file...
ICD-10-CM can also be pulled from the UML Metathesaurus and purchased in convenient formats from Ingenix.
SNOMED-CT
SNOMED CT (Systematized Nomenclature of Medicine--Clinical Terms) is a comprehensive clinical terminology, originally created by the College of American Pathologists (CAP) and, as of April 2007, owned, maintained, and distributed by the International Health Terminology Standards Development Organization (IHTSDO).
SNOMED-CT codes do not have a hierarchical code, like the ICD vocabularies. Rather, SNOMED-CT creates meaningless identifiers and relates them to each other in a directed acyclic graph or DAG (which is where the phrase DAG-gumit! originiated... I am pretty sure). This means that any term in the vocabulary can be related to zero-to-many terms, as long it is cannot end up being its own parent. The relationships themselves are separate from the SNOMED-CT code. SNOMED-CT also separates the notion of concepts and descriptions (or concept synonyms).
Below are the ICD-10-CM codes representing ‘chronic kidney disease':
431855005|disorder|Chronic kidney disease stage 1
431856006|disorder|Chronic kidney disease stage 2
433144002|disorder|Chronic kidney disease stage 3
431857002|disorder|Chronic kidney disease stage 4
433146000|disorder|Chronic kidney disease stage 5
There are roughly 68,000 active disorder concepts in SNOMED-CT.
I have created a number of posts (and a few screencasts) on SNOMED-CT so I would first direct you to earlier posts in this blog.
The main NLM page for SNOMED-CT is located here: http://www.nlm.nih.gov/research/umls/Snomed/snomed_main.html
The SNOMED-CT user's guide is downloadable here: http://www.ihtsdo.org/fileadmin/user_upload/Docs_01/SNOMED_CT/About_SNOMED_CT/Use_of_SNOMED_CT/SNOMED_CT_User_Guide_20090731.pdf
The main CAP page for SNOMED-CT is located here: http://www.cap.org/apps/cap.portal?_nfpb=true&_pageLabel=snomed_page
The Wikipedia page for SNOMED-CT is located here: http://en.wikipedia.org/wiki/SNOMED_CT
You can download SNOMED-CT release files from the NLM site here: http://www.nlm.nih.gov/research/umls/licensedcontent/snomedctfiles.html
Note: to download NLM data files, like SNOMED-CT, you need to register and obtain a license from the NLM. You can do that here http://wwwcf.nlm.nih.gov/umlslicense/snomed/license.cfm
The next post will cover procedure terminologies.
For those of you evaluating the use of the SNOMED-CT Core Subset, you need to be aware that the NLM has made some non-trivial changes to the format and content of the subset file in the latest (second) release dated 200908 (July).
If you have developed a load program, as we have, that uses the subset file to identify concepts that are included in the subset, it is likely you will need to modify that program.
Here is a summary of the changes:
Term Changes:
- Nine terms were added and eleven terms were retired from the core subset.
New Terms:
SNOMED_CID | SNOMED_FSN | SNOMED_CONCEPT_STATUS |
208892001 | Closed traumatic dislocation of hip (disorder) | Current |
165468009 | Erythrocyte sedimentation rate (ESR) raised (finding) | Current |
197321007 | Steatosis of liver (disorder) | Current |
40733004 | Infectious disease (disorder) | Current |
165346000 | Laboratory test result abnormal (situation) | Current |
442234001 | Serum cholesterol borderline high (finding) | Current |
442438000 | Influenza due to Influenza A virus (disorder) | Current |
442551007 | Dental caries extending into dentine (disorder) | Current |
4557003 | Preinfarction syndrome (disorder) | Current |
Retired Terms:
SNOMED_CID | SNOMED_FSN | SNOMED_CONCEPT_STATUS |
41006004 | Depression (finding) | Ambiguous |
309158009 | Laboratory finding abnormal (navigational concept) | Current |
371330000 | Fatty liver (disorder) | Duplicate |
131016008 | Increased thyroid stimulating hormone level (finding) | Duplicate |
166829003 | Serum cholesterol borderline (finding) | Ambiguous |
191415002 | Communicable disease (navigational concept) | Current |
78431007 | Influenza due to Influenza virus, type A, human (disorder) | Ambiguous |
416103000 | Elevated erythrocyte sedimentation rate (finding) | Duplicate |
50047001 | Compound dental caries (disorder) | Ambiguous |
63079007 | Closed traumatic dislocation of hip joint (disorder) | Duplicate |
64333001 | Preinfarction angina (disorder) | Duplicate |
File Structure Changes:
June Subset | July Subset | Change |
SNOMED_CID | SNOMED_CID | - |
FSN | SNOMED_FSN | Name Change |
CONCEPT_STATUS | SNOMED_CONCEPT_STATUS | Name Change Now uses Description instead of Code!!! |
UMLS_CUI | UMLS_CUI | - |
OCCURRENCE | OCCURRENCE | - |
USAGE | USAGE | - |
- | FIRST_IN_SUBSET | New Field (YYYYMM) |
IS_RETIRED | IS_RETIRED_FROM_SUBSET | Name Change |
- | LAST_IN_SUBSET | New Field (YYYYMM) |
- | REPLACED_BY | New Field (SNOMED-CT Concept ID) |
New Fields:
New Field | What is it? |
FIRST_IN_SUBSET | This is the issue year and month when the concept first appeared in the subset. |
LAST_IN_SUBSET | This is the issue year and month when the concept last appeared in the subset as a non-retired concept. |
REPLACED_BY | Concept ID of the concept replacing a retired concept. |
OUCH!
If you developed a program that loads the core subset file this update likely broke it.
If you are using a text ODBC/OLEDB driver to load the file the name changes to the columns broke it.
If you are accessing the fields using sequential access and splitting the fields using the pipe delimiter, the insertion of the FIRST_IN_SUBSET before the IS_RETIRED fields will break your load program.
If you created a function that uses the coded values in the CONCEPT_STATUS field to support your load logic, that is now broken by the switch to the text value. (I don't understand this change at all. It seems to run contrary to the move away from free text. I would change it back...)
Needless to say, this update was a painful one for the early adopter. But, if you have already created logic based on the inaugural release of the core subset data... and early adopter is what you are and it is not without risks.
Along with the painful changes that left our load program writhing on the ground, clutching its face and yelling "You broke my nose!" are some new useful additions.
The FIRST_IN_SUBSET, LAST_IN_SUBSET and REPLACED_BY_SNOMED_CID are useful lifecycle management fields that will help with the management of term availability.
Patience is a Virtue
If this update frustrated you, I would ask that you focus on the positive and consider that the Core subset is another in a growing line of great, "FREE" work products from our friends at the NLM.
It is also worth noting that as we in the HIT industry leverage SNOMED-CT, RxNorm and LOINC the bar will continue to be raised in terms of update frequency and format stability. From the interactions I have had with the NLM, I expect that they are paying attention and will be responsive as we evolve and leverage them more.
Free Advice
As someone who worked at a commercial content provider, I would encourage the following with respect to all data products.
1.) Do not change field/column names lightly if they are included in the file, as developers will leverage that with a text driver to load the information.
2.) Avoid inserting fields into a record, as some load programs will operate based on field order. If you append new fields to the end of the record you will be less likely to disrupt the load.
3.) Coded fields are better than text fields...always.
Regardless of the constructive criticism...this is good stuff. If we at Clinical Architecture can help you better take advantage of it, give us a call!
I have spent some time recently looking into terminologies that are used to represent problem lists. Specifically the terminologies in question are the current de facto standard ICD-9, its successor ICD-10CM and the big Kahuna SNOMED-CT.
This is a good time to be looking at these, as the healthcare IT industry is starting to evolve electronic patient records (regardless of what acronym you use for it) to be a more useful and accurate picture of the patient's state. Until recently the patient's problems were (a) not represented in their electronic records, (b) represented using a home grown code that, aside from providing a display name, provided little support for leveragable decision support or (c) represented using an ICD-9CM code.
As I reviewed these terminologies, I ran across something that concerned me. The concern is with regard to concatenated terms in these terminologies.
What is a Concatenated Term?
A concatenated or combination term is a term that is, in actuality, multiple terms combined together under a single concept identifier.
Here is an example in ICD9CM:
404 - Hypertensive heart and chronic kidney disease
As opposed to the individual single terms:
402 - Hypertensive heart disease
585 - Chronic kidney disease (CKD)

In some cases it can be even more interesting:
404.00 - Hypertensive heart and chronic kidney disease, malignant, without mention of heart failure and with chronic kidney disease stage I through stage IV, or unspecified
When I saw this in the ICD9CM, I assumed it was a legacy issue and existed because ICD9CM has been around for awhile. So I looked in ICD10-CM...
I132 - Hypertensive heart and chronic kidney disease with heart failure and with stage V chronic kidney disease, or end stage renal disease
And then I looked at SNOMED-CT...
194779001 - Hypertensive heart and renal disease with (congestive) heart failure
So, as it turns out, it is still an ongoing practice to create concatenated terms. This is not a legacy thing, or it is a legacy thing that has found its way into the terminologies we are supposed to use as we move forward.
I want to stop at this point and remind the reader that I am not a clinical person. I am a simple country programmer and guerilla informaticist. So there might be a reason for this I am not seeing. If there is, I want to know and I will post it here as a follow up.
Here is what was said on the American Health Insurance Management Association (AHIMA) website in an article about ICD10-CM enhancements regarding combination codes.
"ICD-10-CM includes combination codes for conditions and common symptoms or manifestations. A single code may be used to classify two diagnoses, a diagnosis with an associated sign or symptom, or a diagnosis with an associated complication. This allows one code to be assigned, resulting in fewer cases requiring more than one code and reducing sequencing problems.
Coding professionals have encountered sequencing dilemmas when coding conditions such as unstable angina with arteriosclerotic heart disease or diabetes mellitus with a complication/manifestation such as diabetic nephropathy. Manifestation/etiology conditions require two codes with sequencing mandated by ICD-9-CM. Brackets in the index identified that the etiology (diabetes mellitus) code was sequenced before the manifestation (diabetic nephropathy).
In cases when ICD-9-CM did not indicate correct sequencing, coding was not clear-cut, and Coding Clinic advice was needed to sequence conditions appropriately. The ICD-10-CM use of combination codes has greatly simplified this process."
My concern is how we are using terminology and the compromises we need to make to drive a specific purpose.
In the case of patient problem terminologies, it can boil down to:
(a) Are we trying to solve a sequencing problem to support re-imbursement?
or
(b) Are we trying to allow the computer to assist with the heavy lifting by providing terms that can drive decision support?
Concatenated terms can assist with providing a refined display, with a single click of the patient's complex problem and, apparently, assist with sequencing for coding purposes. However, this can come at the expense of accurate and fast decision support.
Why do I say this?
When you create a decision support rule in healthcare, very often the rule fires based on the juxtaposition of elements of the patients clinical context. For example, a drug-drug interaction is an overlap of two drugs that are currently, or about to be, prescribed on a patient's medical record. Similarly, a contraindication is the overlap of a medication and a patient problem that is known to create a patient safety risk.
By way of example, let's take a fictional medication ‘Nancycillin' and say that it is absolutely contraindicated in patients with Hypertensive heart disease. If we use ICD9CM, in order to support that rule in our engine, we may setup the following rule:
Nancycillin [is absolutely contraindicated in patients with] (402) Hypertensive heart disease
If I set the above simple rule and the patient has the following in their EMR
Fred Flintstone
Problems:
- Hypertensive Heart Disease (402)
- Hypertensive renal disease, unspecified (403.9)
- Renal Failure (586)
The rule would fire just fine.
However, If I had added instead:
The rule would not fire and my patient might have a problem.
To handle the concatenated term we need to setup a collection of rules to cover the potential concatenated terms as follows:
Nancycillin [is absolutely contraindicated in patients with] ...
|
CODE |
Term Description |
|
404.1 |
Hypertensive heart and renal disease, benign |
|
404.11 |
Hypertensive heart and renal disease, benign, with heart failure |
|
404.13 |
Hypertensive heart and renal disease, benign, with heart failure and renal failure |
|
404.12 |
Hypertensive heart and renal disease, benign, with renal failure |
|
404.10 |
Hypertensive heart and renal disease, benign, without mention of heart failure or renal failure |
|
404.9 |
Hypertensive heart and renal disease, unspecified |
|
404.91 |
Hypertensive heart and renal disease, unspecified, with heart failure |
|
404.93 |
Hypertensive heart and renal disease, unspecified, with heart failure and renal failure |
|
404.92 |
Hypertensive heart and renal disease, unspecified, with renal failure |
|
404.90 |
Hypertensive heart and renal disease, unspecified, without mention of heart failure or renal failure |
|
402 |
Hypertensive heart disease |
This is a potential issue, as the people setting up rules are not always in control of the terminology and I would imagine that next week a new concatenated term with ‘hypertensive heart disease' could be added to the terminology and there would still be a terminological gap.
Relationships to the rescue?
This is a problem in ICD9-CM and ICD10-CM, what about SNOMED-CT? SNOMED-CT, as I mentioned, has concatenated terms, but it also has something else... relationships. Can the relationships in SNOMED-CT bridge this issue?
In the SNOMED-CT relationships table, if you use a concatenated term there is a way to break it into its antecedents through the relationships table using the 'is a' relationship type.
Let's take our SNOMED-CT example of:
Hypertensive heart and renal disease with (congestive) heart failure
It has the following related terms in the SNOMED_CT Relationships Table:

It breaks down the term into parent terms using the ‘Is a' relationship type. But if you look at the second one, you see that it is also a concatenated term. When we break that one down we get the following:

It is worth noting that each of these terms also breaks down (or breaks UP) to a parent or parents. Using the relationship structure when I am done the term ‘Hypertensive heart and renal disease with (congestive) heart failure' results in 72 ‘Is a' antecedents (click here for the full list).
In order to accurately process my rules using the relationship table, I need to traverse the SNOMED-CT relationships across 72 antecedents, for this one term alone. Keep in mind, this has to be done for every term because I have no way of knowing which terms are concatenated and which ones are single.
So what's missing?
In order to deal with the issues, there are a few things that would come in handy.
Concatenated Term Indicator
A flag on a term that indicates that it is comprised of more than a single concept would provide the savvy terminology consumer with the ability to apply special processing or avoid concatenated terms altogether.
Concatenated Term to Components Relationship
If these terminologies provided relationships to their component terms that would be even better. This would not be like the ‘is a' relationships in SNOMED-CT. The ‘Is a' relationship is an ontological relationship that relates concepts into the conceptual framework of SNOMED-CT. This would be a compositional link relating a concatenated term to the single conceptual terms that, when combined, are the conceptual equal of the concatenated term. In this relationship structure a concatenated term would never link to another concatenated term.
As an added bonus, if this type of relationship is created it can serve to identify which terms are concatenated.
This could be done within the existing SNOMED-CT structure by creating a new relationahip type like the 'is a' relationship but based purely on compositional relationships.
If this type of information was available, it could be used when processing clinical decision support to automatically include all concatenated terms that the included term is a component of.
I am not suggesting this is a simple thing to do, especially when you have terms like the following:
Hypertensive heart and chronic kidney disease, malignant, without mention of heart failure and with chronic kidney disease stage I through stage IV, or unspecified
What I am suggesting is that in order to create targeted, effective clinical decision support there has to be an effective strategy to deal with concatenated terms. Without that you place a significant burden on the content authors to accurately cover related concatenated terms with a specific rule. If you decide to rely on the ontological relationships, in the case of SNOMED-CT, you run the risk of generating significant noise as it can't tell where to stop.
My suggestion, in the short term, for those of you looking into using ICD-9, ICD10 CM or SNOMED-CT to drive diagnosis, problem or procedure codes is to create policies that avoid adding concatenated terms into the subset you are providing to users for selection. This will help you to avoid the problem and prepare your application for a easier road when, in the future, you begin to try to use those codes to drive clinical decision support.
I recevied an email from the NLM UMLS Users listserv today with the following subject 'CORE Problem List Subset of SNOMED CT Now Available'. Being a UMLS enthusiast, I quickly downloaded the data and scoped it out. I thought I would share what I found out with you.
CORE Problem List Subset, What is it?
It is a subset of the complete SNOMED CT terminology that is design to help implementers by acting as a starter set of codes.
There are 5182 terms in the core data released today as opposed to the roughly 386,000 terms in the complete SNOMED CT terminology.
Where did it come from?
The data released today is based on the datasets submitted by the following institutions:
- Beth Israel Deaconess Medical Center
- Intermountain Healthcare
- Kaiser Permanente
- Mayo Clinic
- Nebraska University Medical Center
- Regenstrief Institute
- Hong Kong Hospital Authority
Why was it created?
This new core subset can provide a vendor or institution with a starter set of common terms that are used to record clinical observations (in fact CORE stands for Clinical Observations Recording and Encoding).
What is in the file that you can download?
The terms that were selected are available in a pipe ‘|' delimited file with the following record format.
|
Position |
Field name |
Description |
|
1 |
SNOMED_CID |
This is the concept identifier for the term. (If you are a regular SNOMED enthusiast, this is the same ID that you would find in the SCT_CONCEPTS_yyyymmdd.txt file.) |
|
2 |
FSN |
Fully Specified Name (the term description) |
|
3 |
CONCEPT_STATUS |
This is the concept status of the concept ID in the SCT_CONCEPT file. According to the extractions rules for the core this should always be zero, which means ‘current'. (Which also means you can probably ignore this field). |
|
4 |
UMLS_CUI |
Concept Unique Identifier for this SNOMED CT concept in the UMLS Metathesaurus MRCONSO table. |
|
5 |
OCCURRENCE |
The number of contributing institutions that have the concept in their problem list (currently from 1-7). |
|
6 |
USAGE |
The sum of the usage of this term divided by the 7. (I wonder if this would be better if it was the sum of the usage divided by the occurrence?? - I will follow up with NLM.) |
|
7 |
IS_RETIRED |
This is a field for the future to support when terms are retired. I would assume that the CONCEPT_STATUS field would also reflect that the SNOMED CT concept is no longer current as well. |
Note: I went back and forth on the USAGE field. I thought it was interesting that the sum of the usage was divided by the full count of seven and not the OCCURRENCE value. When you take the USAGE number, multiply it by seven and divide by the OCCURRENCE number the result is, in most cases, a much higher value that reflects the usage of the term within the institutions that are actually using the term. If you are a big data nerd (like me) the variance in how the terms are ranked depending on which way you look at the usage is interesting. I am also interested on how the original institutional average was calculated. (once again... nerd).
A Quick Look at the Data
When you take the supplied terms and sort them in order based on the USAGE number, here are the top 25 terms.

When I see this list it seems reasonable to me that these would have a higher usage in a problem or finding list. All of the terms are at a fairly high level and are the types of things you would expect to have a higher volume of occurances.
Impressions
If you are just getting started with SNOMED CT and thinking about using it as a reference terminology for tracking findings and problems in your electronic medical record, this new CORE subset is a great starting point. Kudos to the NLM and the constributing institutions for providing this information - it should facilitate the implementation of SNOMED CT by providing a place to start.
For more information checkout the full write up on the NLM website at:
http://www.nlm.nih.gov/research/umls/Snomed/core_subset.html