PLEASE NOTE: The Meaningful Use Final Rule was released on July 12 and the UNII is no longer listed as the standard for allergy terminology. In fact, there is NO standard listed for allergy interoperability. For the record, I do not think that the following blog post, which "aired" on June 30th, influenced the governments decision making process in any way. My next post will suggest a significant stimluls for healthcare IT companies with the word 'architecture' in the name... just in case. In order to preserve history, I am leaving the post as it was. It still provides a decent overview of UNII for those of you that would like to leverage it.
The vocabulary chosen to represent patient allergies is the FDA Unique Ingredient Identifier or UNII (I guess ‘UII' would be a difficult acronym to use in casual conversation...).
The UNII is part of the Substance Registration System whose purpose is to provide unique identifiers for:
Foods
- Food substances are specific foods or components of food, regardless of whether the food is in conventional food form or a dietary supplement, such as vitamins, minerals, herbs, or other similar nutritional substances.
Drugs
- Drug substances include both active and inactive ingredients used in drug products, including those for veterinary purposes.
Biologics
- Biologic substances include both active and inactive ingredients used in biologics, such as blood products, therapeutic products, vaccines, cellular and gene therapy products, allergenic products, tissues, and certain devices (e.g., enzymes in stabilized solutions).
Devices
- Device substances include certain components of some devices (e.g. silicon for implants, and chemical reagents for glucose test kits).
Cosmetics
- Cosmetic substances are components of cosmetic products, such as flavors, fragrances, colorants, vitamins, plant- and animal-derived ingredients, and polymers.
There is more general information on the UNII here: http://www.fda.gov/ForIndustry/DataStandards/SubstanceRegistrationSystem-UniqueIngredientIdentifierUNII/default.htm
According to the above site, the UNII is:
- One of the core components of the United States Federal Medication Terminology.
- Used in the FDA's Structured Product Labeling
- Used to assist in the generation of the National Library of Medicine's (NLM's) RxNorm.
- A US government standard for drug ingredient and food allergen identifiers
- A component of the Environmental Protection Agency's Substance Registry System (future)
The UNII may be found in:
- NLM's Unified Medical Language System (UMLS)
- National Cancer Institute's Enterprise Vocabulary Service
- USP Dictionary of USAN and International Drug Names (future)
- FDA Data Standards Council website
- VA National Drug File Reference Terminology (NDF-RT)
- FDA Inactive Ingredient Query Application
The UNII is provided, rather inconveniently, in excel format.
There is a multi-worksheet (A-S, T-Z), denormalized, zipped excel workbook dated 6/25/2010 at the following location.
http://www.fda.gov/downloads/ForIndustry/DataStandards/StructuredProductLabeling/UCM217498.zip
The sheets are difficult to work with because they have combined the concepts and their synonyms into a single list. It is also worth noting, that in the data provided the synonyms do not have unique identifiers.
Sheet Structure
The primary sheets with the UNII codes in them have the following columns:
|
Preferred substance name
|
This is the preferred name of the substance
|
|
UNII
|
The Unique identifier the preferred substance name
|
|
Substance name
|
A synonym for the preferred substance name
|
|
IT IS TSN
|
This is not really documented, BUT I believe it is, where applicable, a code representing the USDA Integrated Taxonomic Information System (ITIS) Taxanomic Serial Number (TSN). This appear to only be populated for food ingredients
|
|
Molecular Formula
|
This is, you guessed it, the molecular formula. It seems to only be populated for chemical ingredients.
|

Code structure and design
The UNII code is a ten character alpha-numeric code. The first nine digits are randomly generated and the tenth digit is determined by an algorithm (a check digit for you old timers who wrote serial port interfaces...).
The Numbers
There are 16,655 unique UNII concepts in the provided list.
There are 67,715 synonyms, including the preferred names.
What's missing?
UNII Type:
We know that the scope of the UNIIs covers a number to types of substances. It would be very useful if there was a way of telling which UNIIs are of which type so that we could filter them. I may not want to include cosmetics OR biologics in my allergy pick list, for example.
Allergies:
In most systems that track allergies, medication allergies in particular, they allow the user to represent allergies using medication ingredients, common brand names OR allergy classes. The UNII scope only covers one of these. How will we use UNII to represent and documented allergy or adverse reaction to ‘Nyquil' or ‘cephalosporins'? Also, if you are going to represent allergies should the list include animals and environmental allergies.
Not a Rant
I don't want to get off on a rant here... but it seems like for some of these meaningful use terminologies, rather than creating a terminology designed to support appropriate interoperability, we looked to see what we already had lying around. UNII is not an allergy terminology, it is a substance terminology. They are not the same thing. They are terminology domains that merely overlap. I know, creating a terminology is hard but, ahem, 19 billion dollars! This is not a criticism directed at the UNII codes or the people that maintain them. It looks like a very thorough substance terminology with a fairly simple design, but it will not support allergy interoperability as it should be supported. Now, we could change UNII terminology to include allergy classes, animals and environmental terms, but that would make it a less wonderful substance terminology then, wouldn't it? Perhaps a better approach would be to use UNII in our allergy interoperability terminology, in the utopian future, to represent substances (with types please) and we could append the other allergy types (classes, animals, environmental) to save money and reduce the deficit. I could live with that.
(I will now climb down from my virtual soap box, so that you can come out from behind your furniture...)
To make up for the non-rant, I am happy to provide a normalized version of the most recent UNII data for your experimentation. It is provided in a zip file as two, pipe ‘|'delimited text files with the following structure.

If you would like to receive this file, contact us and ask for it. We will email it to you or provide you with access to our FTP server.
I want to thank Bonnie for reminding me that I should do this post.
I will try to post more frequently.
What seems to be the Problem?
International Classification of Diseases (ICD)
International Classification of Diseases is a publication from the World Health Organization (WHO) and it provides a number of vocabularies for expressing disease concepts.
The history of the ICD is available here: http://www.who.int/classifications/icd/en/HistoryOfICD.pdf
ICD-9-CM
The International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) is based on the World Health Organization's Ninth Revision, International Classification of Diseases (ICD-9).
ICD-9-CM is the official system of assigning codes to diagnoses and procedures associated with hospital utilization in the United States.
The structure of ICD-9-CM codes is relatively straight forward. The code itself is an explicit hierarchy with the primary disease characteristic typically represented by the first part of the code and the secondary characteristics grouped in numeric sequence in the second part of the code.

As you can see in the example below you should always treat the ICD-9-CM code as text and not a numeric as numeric interpretation of the code would be a disaster.
Below are the ICD-9-CM codes representing ‘hypertensive chronic kidney disease':
403 Hypertensive chronic kidney disease
403.0 Hypertensive chronic kidney disease, malignant
403.00 Hypertensive chronic kidney disease, malignant, with chronic kidney disease stage I through stage IV, or unspecified
403.01 Hypertensive chronic kidney disease, malignant, with chronic kidney disease stage V or end stage renal disease
403.1 Hypertensive chronic kidney disease, benign
403.10 Hypertensive chronic kidney disease, benign, with chronic kidney disease stage I through stage IV, or unspecified
403.11 Hypertensive chronic kidney disease, benign, with chronic kidney disease stage V or end stage renal disease
403.9 Hypertensive chronic kidney disease, unspecified
403.90 Hypertensive chronic kidney disease, unspecified, with chronic kidney disease stage I through stage IV, or unspecified
403.91 Hypertensive chronic kidney disease, unspecified, with chronic kidney disease stage V or end stage renal disease
Note: This manner of establishing codes is less than ideal. A smart code is a identifier that implies meaning through its structure. Typically this manner of establishing codes becomes fraught with issues as a coding scheme becomes more complex over time. For example, there is not a very good way to express a disease or procedure in ICD-9 if it belongs in more than one place in the hierarchy (poly-hierarchical) without creating duplicate concepts (which is bad).
There are roughly 22,000 ICD-9-CM codes.
The ‘Home Page' of ICD-9-CM is http://www.cdc.gov/nchs/icd/icd9cm.htm
Wikipedia has a fairly robust ICD-9 list and reference capability here: http://en.wikipedia.org/wiki/List_of_ICD-9_codes.
Like all public standards is not provided in a format that makes it easy to use. The downloads that are available via FTP are rich text files that are human readable that are not easy to parse into a typical application consumable vocabulary file.
There are a number of web sites that providing search and lookup tools for ICD-9 but the only source of free coded ICD-9-CM codes (that I have found) is the UMLS metathesaurus (also not easy...).
If you want easy and well structured you need to pay...
I recommend Ingenix at the following link: http://www.shopingenix.com/Category/100093/Product/16699/
ICD-10-CM
Like ICD-9-CM, ICD-10-CM is based on the World Health Organization ICD-10 coding system. ICD-10 is designated to replace ICD-9 and is a more granular terminology (actually more like SNOMED-CT).
The structure of ICD-10-CM is different than ICD-9. The codes are alphanumeric where the initial alpha code delineates the codes into 22 chapters.
Below are the ICD-10-CM codes representing ‘hypertensive chronic kidney disease':
I120 Hypertensive chronic kidney disease with stage V chronic kidney disease or end stage renal disease
I129 Hypertensive chronic kidney disease with stage I through stage IV chronic kidney disease, or unspecified chronic kidney disease
There are roughly 68,000 ICD-10-CM codes.
The structure of the ICD-10 that is as is depicted below (thanks to the AHIMA website).

There is a good primer on the differences between ICD-9 and ICD-10 on the AHIMA website here: http://library.ahima.org/xpedio/groups/public/documents/ahima/bok1_038084.hcsp?dDocName=bok1_038084
The ‘Home Page' of the ICD-10-CM is http://www.cdc.gov/nchs/icd/icd10cm.htm
Wikipedia has a fairly robust ICD-10-CM list and reference capability here: http://en.wikipedia.org/wiki/ICD-10.
RTF was apparently too easy as ICD-10-CM is published as a PDF file...
ICD-10-CM can also be pulled from the UML Metathesaurus and purchased in convenient formats from Ingenix.
SNOMED-CT
SNOMED CT (Systematized Nomenclature of Medicine--Clinical Terms) is a comprehensive clinical terminology, originally created by the College of American Pathologists (CAP) and, as of April 2007, owned, maintained, and distributed by the International Health Terminology Standards Development Organization (IHTSDO).
SNOMED-CT codes do not have a hierarchical code, like the ICD vocabularies. Rather, SNOMED-CT creates meaningless identifiers and relates them to each other in a directed acyclic graph or DAG (which is where the phrase DAG-gumit! originiated... I am pretty sure). This means that any term in the vocabulary can be related to zero-to-many terms, as long it is cannot end up being its own parent. The relationships themselves are separate from the SNOMED-CT code. SNOMED-CT also separates the notion of concepts and descriptions (or concept synonyms).
Below are the ICD-10-CM codes representing ‘chronic kidney disease':
431855005|disorder|Chronic kidney disease stage 1
431856006|disorder|Chronic kidney disease stage 2
433144002|disorder|Chronic kidney disease stage 3
431857002|disorder|Chronic kidney disease stage 4
433146000|disorder|Chronic kidney disease stage 5
There are roughly 68,000 active disorder concepts in SNOMED-CT.
I have created a number of posts (and a few screencasts) on SNOMED-CT so I would first direct you to earlier posts in this blog.
The main NLM page for SNOMED-CT is located here: http://www.nlm.nih.gov/research/umls/Snomed/snomed_main.html
The SNOMED-CT user's guide is downloadable here: http://www.ihtsdo.org/fileadmin/user_upload/Docs_01/SNOMED_CT/About_SNOMED_CT/Use_of_SNOMED_CT/SNOMED_CT_User_Guide_20090731.pdf
The main CAP page for SNOMED-CT is located here: http://www.cap.org/apps/cap.portal?_nfpb=true&_pageLabel=snomed_page
The Wikipedia page for SNOMED-CT is located here: http://en.wikipedia.org/wiki/SNOMED_CT
You can download SNOMED-CT release files from the NLM site here: http://www.nlm.nih.gov/research/umls/licensedcontent/snomedctfiles.html
Note: to download NLM data files, like SNOMED-CT, you need to register and obtain a license from the NLM. You can do that here http://wwwcf.nlm.nih.gov/umlslicense/snomed/license.cfm
The next post will cover procedure terminologies.
I recently had a request to create a post providing a primer on the vocabularies of meaningful use. Let's start with a review of the vocabularies that are named in the meaningful use criteria described on pages 21 and 22 of the January 13
th release of the federal register located here:
http://edocket.access.gpo.gov/2010/pdf/e9-31216.pdf.
The "Chosen Ones"
The listed vocabularies and their purpose are as follows:
|
Terminology |
Stage |
Purpose(s) |
|
ICD-9-CM |
Stage 1
Stage 1 |
Problems
Procedures |
|
ICD-10-CM |
Stage 2 |
Problems |
|
ICD-10-PCS |
Stage 2 |
Procedures |
|
SNOMED-CT |
Stage 1
Stage 2
Stage 2 |
Problems
Problems
Lab Results (Submission Public Health) |
|
CPT-4 |
Stage 1
Stage 2 |
Procedures
Procedures |
|
Third Party Drug Vocabularies* |
Stage 1
Stage 1 |
Medications
Electronic Prescribing |
|
RxNorm |
Stage 2
Stage 2 |
Medications
Electronic Prescribing |
|
UNII |
Stage 2 |
Medication Allergies |
|
CVX |
Stage 1
Stage 2 |
Immunization Registries
Immunization Registries |
|
LOINC |
Stage 1
Stage 1
Stage 2
Stage 2 |
Lab Orders (from Reference labs)
Lab Results (from reference labs)
Lab Orders (All)
Lab Results (All) |
|
UCUM |
Stage 2 |
Units of Measure |
|
CDA template |
Stage 2 |
Vital Signs |
* Third Party drug vocabularies that are listed as complete in RxNorm by the NLM
Meaningful Selection
What does it mean that a vocabulary is one of the chosen ones? My understanding is that the meaningful use criteria (based on my reading of the federal register) defines that to be certified EHR technologies must provide patient summaries and interoperate (exchange data) using the listed vocabularies for their defined purposes. In other words, the vocabulary standards are for interoperability not native persistence in the EHR application.
It is not reasonable to expect that every hospital / physicians office in the US will migrate their patient data to these standards (and then do it again for 2013). As a good application architect, your objective is to determine how you will be able to express your client's patient information in the anointed vocabularies.
Where to Learn More
There is a lot I can say about these vocabularies, both their suitability to the task that have been so capriciously assigned to them and the challenges associated in working with each of them. This is not the post for that particular diatribe. In this post, I will try to give you some high level information and some places to find out more. So strap on your learning caps and practice your right click ‘open in new tab' skills.
The next post will cover the problem vocabularies.