Subscribe to our blog

Your email:

Clinical Architecture Healthcare IT Blog

Current Articles | RSS Feed RSS Feed

SNOMED-CT Core Subset – Significant Changes in July File

  | Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon | Submit to Reddit reddit 
For those of you evaluating the use of the SNOMED-CT Core Subset, you need to be aware that the NLM has made some non-trivial changes to the format and content of the subset file in the latest (second) release dated 200908 (July).

If you have developed a load program, as we have, that uses the subset file to identify concepts that are included in the subset, it is likely you will need to modify that program.

Here is a summary of the changes:

Term Changes:

  • Nine terms were added and eleven terms were retired from the core subset.

New Terms:

SNOMED_CID

SNOMED_FSN

SNOMED_CONCEPT_STATUS

208892001

Closed traumatic dislocation of hip (disorder)

Current

165468009

Erythrocyte sedimentation rate (ESR) raised (finding)

Current

197321007

Steatosis of liver (disorder)

Current

40733004

Infectious disease (disorder)

Current

165346000

Laboratory test result abnormal (situation)

Current

442234001

Serum cholesterol borderline high (finding)

Current

442438000

Influenza due to Influenza A virus (disorder)

Current

442551007

Dental caries extending into dentine (disorder)

Current

4557003

Preinfarction syndrome (disorder)

Current

Retired Terms:

SNOMED_CID

SNOMED_FSN

SNOMED_CONCEPT_STATUS

41006004

Depression (finding)

Ambiguous

309158009

Laboratory finding abnormal (navigational concept)

Current

371330000

Fatty liver (disorder)

Duplicate

131016008

Increased thyroid stimulating hormone level (finding)

Duplicate

166829003

Serum cholesterol borderline (finding)

Ambiguous

191415002

Communicable disease (navigational concept)

Current

78431007

Influenza due to Influenza virus, type A, human (disorder)

Ambiguous

416103000

Elevated erythrocyte sedimentation rate (finding)

Duplicate

50047001

Compound dental caries (disorder)

Ambiguous

63079007

Closed traumatic dislocation of hip joint (disorder)

Duplicate

64333001

Preinfarction angina (disorder)

Duplicate

File Structure Changes:

June Subset

July Subset

Change

SNOMED_CID

SNOMED_CID

-

FSN

SNOMED_FSN

Name Change

CONCEPT_STATUS

SNOMED_CONCEPT_STATUS

Name Change

Now uses Description instead of Code!!!

UMLS_CUI

UMLS_CUI

-

OCCURRENCE

OCCURRENCE

-

USAGE

USAGE

-

-

FIRST_IN_SUBSET

New Field (YYYYMM)

IS_RETIRED

IS_RETIRED_FROM_SUBSET

Name Change

-

LAST_IN_SUBSET

New Field (YYYYMM)

-

REPLACED_BY

New Field (SNOMED-CT Concept ID)

New Fields:

New Field

What is it?

FIRST_IN_SUBSET

This is the issue year and month when the concept first appeared in the subset.

LAST_IN_SUBSET

This is the issue year and month when the concept last appeared in the subset as a non-retired concept.

REPLACED_BY

Concept ID of the concept replacing a retired concept.

OUCH!

If you developed a program that loads the core subset file this update likely broke it. 

If you are using a text ODBC/OLEDB driver to load the file the name changes to the columns broke it. 

If you are accessing the fields using sequential access and splitting the fields using the pipe delimiter, the insertion of the FIRST_IN_SUBSET before the IS_RETIRED fields will break your load program.  

If you created a function that uses the coded values in the CONCEPT_STATUS field to support your load logic, that is now broken by the switch to the text value. (I don't understand this change at all.  It seems to run contrary to the move away from free text.  I would change it back...)

Needless to say, this update was a painful one for the early adopter.  But, if you have already created logic based on the inaugural release of the core subset data... and early adopter is what you are and it is not without risks.

Along with the painful changes that left our load program writhing on the ground, clutching its face and yelling "You broke my nose!" are some new useful additions.

The FIRST_IN_SUBSET, LAST_IN_SUBSET and REPLACED_BY_SNOMED_CID are useful lifecycle management fields that will help with the management of term availability.

Patience is a Virtue

If this update frustrated you, I would ask that you focus on the positive and consider that the Core subset is another in a growing line of great, "FREE" work products from our friends at the NLM. 

It is also worth noting that as we in the HIT industry leverage SNOMED-CT, RxNorm and LOINC the bar will continue to be raised in terms of update frequency and format stability.  From the interactions I have had with the NLM, I expect that they are paying attention and will be responsive as we evolve and leverage them more.

Free Advice

As someone who worked at a commercial content provider, I would encourage the following with respect to all data products.

1.) Do not change field/column names lightly if they are included in the file, as developers will leverage that with a text driver to load the information.

2.) Avoid inserting fields into a record, as some load programs will operate based on field order. If you append new fields to the end of the record you will be less likely to disrupt the load.

3.) Coded fields are better than text fields...always.

Regardless of the constructive criticism...this is good stuff.  If we at Clinical Architecture can help you better take advantage of it, give us a call!

Comments

There are no comments on this article.
Comments have been closed for this article.