The following paper was formally released a few weeks ago.
J Am Med Inform Assoc. 2011 July; 18(4): 432–440.
Published online 2011 April 21. doi: 10.1136/amiajnl-2010-000045 PMCID: PMC3128394
Getting the foot out of the pelvis: modeling problems affecting use of SNOMED CT hierarchies in practical applications
Alan L Rector,1 Sam Brandt,2 and Thomas Schneider1
1School of Computer Science, University of Manchester, Manchester, UK
2Siemens Health Services, Malvern, Pennsylvania, USA
Correspondence to Alan L Rector, School of Computer Science, University of Manchester, Manchester M13 9PL, UK; Email: rector@cs.manchester.ac.uk
Received December 13, 2010; Accepted December 30, 2010.
Abstract
Objectives
(a) To determine the extent and range of errors and issues in the Systematised Nomenclature of Medicine – Clinical Terms (SNOMED CT) hierarchies as they affect two practical projects. (b) To determine the origin of issues raised and propose methods to address them.
Methods
The hierarchies for concepts in the Core Problem List Subset published by the Unified Medical Language System were examined for their appropriateness in two applications. Anomalies were traced to their source to determine whether they were simple local errors, systematic inferences propagated by SNOMED's classification process, or the result of problems with SNOMED's schemas. Conclusions were confirmed by showing that altering the root cause and reclassifying had the intended effects, and not others.
Main results
Major problems were encountered, involving concepts central to medicine including myocardial infarction, diabetes, and hypertension. Most of the issues raised were systematic. Some exposed fundamental errors in SNOMED's schemas, particularly with regards to anatomy. In many cases, the root cause could only be identified and corrected with the aid of a classifier.
Limitations
This is a preliminary ‘experiment of opportunity.’ The results are not exhaustive; nor is consensus on all points definitive.
Conclusions
The SNOMED CT hierarchies cannot be relied upon in their present state in our applications. However, systematic quality assurance and correction are possible and practical but require sound techniques analogous to software engineering and combined lexical and semantic techniques. Until this is done, anyone using SNOMED codes should exercise caution. Errors in the hierarchies, or attempts to compensate for them, are likely to compromise interoperability and meaningful use.
Keywords: Knowledge bases, knowledge representations, methods for integration of information from disparate sources, knowledge acquisition and knowledge management, developing and refining EHR data standards (including image standards), data models, data exchange, controlled terminologies and vocabularies, communication, integration across care settings (inter- and intraenterprise), ontologies, terminology, EHRs
The full free text is available here:
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3128394/?tool=pubmed
This paper needs to be carefully considered as it is written by an internationally recognised authority in the area of clinical terminology deployment.
Here is a link to his home page:
http://www.cs.man.ac.uk/~rector/home_page_rector/
The range of issues and problems identified included the following:
- Errors and omissions with propagation and helter-skelter modelling
- Incomplete modeling: myocardial infarction and ischemic heart disease
- Issues with sites of systemic disorders
- Errors in modeling anatomy: Structure-Entire-Part (SEP) triples and the ankle in the abdomen
- Overgeneralized concepts with underspecified ‘fully specified names’
- Lack of distinction between structure and function
- Inconsistent modeling of complications: hypertensive disorders
Detailed examples of each of these are found in the text.
The full text of the conclusions is as follows:
“This study has five classes of outcome:
- On the SNOMED hierarchies. There are sufficient anomalies in the hierarchies that they cannot be used without significant modification in our applications. More generally, we question whether clinicians entering codes or researchers retrieving information understand their implications. As postcoordination relies on accurate classification, it is doubtful that applications using postcoordination will behave predictably.
- On the use of description logic in SNOMED. Using a description logic is both part of the problem and part of the solution. The response to the issues raised here is not to abandon SNOMED's description logic but to use it more effectively. Using a description logic means that the correcting root errors found in modules will usually repair analogous problems throughout SNOMED.
- On the possibility of quality assurance of SNOMED. Given modern tooling and computer power, the barriers quality assurance of SNOMED can now be overcome, although no well-integrated toolset is yet available.
- On practicality of quality assurance of SNOMED. This was a preliminary study and not exhaustive, but it required less than three person-months using poorly integrated tools. Given an integrated toolset, we estimate that a thorough quality assurance of the Core Problem List Subset would require a small team under 2 years, probably less. This would cover a high fraction of all uses of SNOMED. Most changes would be propagated automatically by the description logic into the full SNOMED corpus. Applying these methods to the remainder of the SNOMED findings would require further resources, but they would be minor by comparison with the effort already devoted to SNOMED's development, let alone to those that will be required for its implementations.lvii
- On methods required. Using a description logic requires staff who understand both medical content and description logics. It requires adapting the techniques of software engineering to tracing and managing errors. Space does not permit setting out a detailed methodology.lviii However, key maxims should include:
- Start from clinically important concepts—use clinical intuition.
- Focus on the classified hierarchies—reclassify after every change.
- Work in small modules—so that reclassification is quick.
- Look upwards first and then downwards—there are fewer ancestors than descendants.
- Trace all errors to their root cause—avoid local ‘kluging.’
- Look for analogous errors and repair using consistent patterns—for example, complications and sites.
- Reformulate problematic sections systematically rather than attempting to repair them—for example, head injury and branches in anatomy.
- Use a combination of lexical and semantic methods—as first suggested by Campbell et al19 and now made straightforward using Ontology Patterns Preprocessing Language (OPPL).20
- Test systematically—maintain a suite of ‘unit tests’ covering all issues identified; include tests for unintended consequences of changes; run test suite after every major set of changes and before each release.
Some might argue that many of the erroneous classifications reported here are several steps removed from the original concept in the hierarchies and would be ignored by clinicians. However, the semantics of the description logic underpinning SNOMED is unambiguous. Software and queries must follow them literally. Likewise, the reliability of postcoordination is a function of the reliability of the classifier, which is best determined by its manifestation in the hierarchies.
Until comprehensive quality assurance has been undertaken, anyone using, or mandating, SNOMED should be aware that the hierarchies contain serious anomalies. Should a ‘Reference terminology’ classify diabetes as a disease of the abdomen; fail to classify myocardial infarction as ischemic heart disease; place the arteries of the foot in the abdomen?
Without further quality assurance, clinicians may not realize the implications of what they are saying; researchers may not realize what their queries should retrieve, and postcoordination cannot be expected to be reliable. Interoperability, and therefore meaningful use, will be limited.”
I suggest anyone who is interested in the area read the whole paper carefully and then e-mail NEHTA (terminologies@nehta.gov.au) asking them just when the work recommended here will be undertaken and finalised. A decision to deploy SNOMED CT was made by NEHTA about 4 years ago and the very limited use so far also suggests there are some significant implementation problems.
It seems that while SNOMED is the best available choice for a clinical terminology there is a real effort to be undertaken to make it fully ‘fit for purpose’. Right now is seems it isn’t. It is especially worrying that there seem to be some clear patient safety issues.
Again we seem to be seeing that NEHTA has over promised and under delivered. They need to get weaving and push for the changes Prof. Rector is suggesting with IHTSDO - the international maintainers of SNOMED.
David.