May27

Better Data For Better Mapping

Wednesday, 27 May 2015

Clean data makes for better mappings

Most departmental and enterprise electronic medical records operate with local, homegrown vocabularies, but as deadlines for exchanging information between healthcare providers grow closer, standardizing and mapping  local data sets using national standard terminologies becomes increasingly important, and taking steps to "clean up" local data will improve the quality and efficiency of the terminology mapping.

While there are plenty of tools available (such as RELMA) to help health care organizations map their local terminologies to specific standards like LOINC, there are also a range of challenges presented with this task. One common issue is “noise” in the local data set. For example, many hospital networks may append acronyms onto diagnostic test names to indicate which floor or department ordered the test. A “CBC 2West” is the same as a “CBC 3East,” but this extraneous noise can clog up some mapping engines, requiring the user to manually select the correct term. Sophisticated tools, like Clinical Architecture’s Symedical application, which we use at Apelon for most mapping engagements, allow the user to designate noise removal rules with specific alphanumeric descriptions or lexical patterns. In the above example, we could have written rules to specifically strip out the “2West” and “3East,” or we could have made a more powerful rule to strip out any floor designation by telling the system to remove the final word in the test name if it starts with a digit and contains either North, South, East or West. We always balance the complexity of the data-cleaning logic with the size and complexity of the data set… while it might be possible to write rules for every possible permutation, the resulting mapping logic looks like a Rube Goldberg contraption and can create as many problems as it solves.

Another common challenge in mapping data to a standard is ensuring all of the local sites are consistent in the terms they are selecting from the standard. Various health care organizations may describe the same test differently; site A might say “rapid strep test” while site B specifies “throat strep test.” Because the specimen is only described in site B’s term, and because LOINC has separate strep test-related concepts for “unspecified specimen” and “throat swab”, the same test could end up mapped to different targets in LOINC when the description varies as in the example above. Since the goal of mapping to a standard is to increase consistency and interoperability, your healthcare network should either set a mapping policy to ensure that different sites are selecting the same target when the same test is indicated, or perhaps create a content model that accounts for this variation. Planning ahead for these types of questions and challenges not only ensures high quality work, but can save a lot of time, effort, and money for the health care organization.

In all our years of mapping local vocabularies to standards like SNOMED CT, LOINC, and RxNorm, we’ve learned that each data set is different, but at the same time we know that some upfront attention to the incoming data set will improve the quality and efficiency of the data mapping process.    Most departmental and enterprise electronic medical records operate with local, homegrown vocabularies, but as deadlines for exchanging information between healthcare providers grow closer, standardizing and mapping  local data sets using national standard terminologies becomes increasingly important.

While there are plenty of tools available (such as RELMA) to help health care organizations map their local terminologies to specific standards like LOINC, there are also a range of challenges presented with this task. One common issue is “noise” in the local data set. For example, many hospital networks may append acronyms onto diagnostic test names to indicate which floor or department ordered the test. A “CBC 2West” is the same as a “CBC 3East,” but this extraneous noise can clog up some mapping engines, requiring the user to manually select the correct term. Sophisticated tools, like Clinical Architecture’s Symedical application, which we use at Apelon for most mapping engagements, allow the user to designate noise removal rules with specific alphanumeric descriptions or lexical patterns. In the above example, we could have written rules to specifically strip out the “2West” and “3East,” or we could have made a more powerful rule to strip out any floor designation by telling the system to remove the final word in the test name if it starts with a digit and contains either North, South, East or West. We always balance the complexity of the data-cleaning logic with the size and complexity of the data set… while it might be possible to write rules for every possible permutation, the resulting mapping logic looks like a Rube Goldberg contraption and can create as many problems as it solves.

Another common challenge in mapping data to a standard is ensuring all of the local sites are consistent in the terms they are selecting from the standard. Various health care organizations may describe the same test differently; site A might say “rapid strep test” while site B specifies “throat strep test.” Because the specimen is only described in site B’s term, and because LOINC has separate strep test-related concepts for “unspecified specimen” and “throat swab”, the same test could end up mapped to different targets in LOINC when the description varies as in the example above. Since the goal of mapping to a standard is to increase consistency and interoperability, your healthcare network should either set a mapping policy to ensure that different sites are selecting the same target when the same test is indicated, or perhaps create a content model that accounts for this variation. Planning ahead for these types of questions and challenges not only ensures high quality work, but can save a lot of time, effort, and money for the health care organization.

In all our years of mapping local vocabularies to standards like SNOMED CT, LOINC, and RxNorm, we’ve learned that each local vocabulary is different, but at the same time we know that just a little attention to the incoming data set will improve the quality and efficiency of the terminology mapping process. 

About the Author

John Carter