The language used in electronic medical records is deeply confusing to a computer


Think about how many keystrokes you make a day in your practice. Now, multiply that number by 744,437 – the number of registered health practitioners in Australia.

That’ll give you a rough idea of just how much free text is being generated by the health sector every day.

This free text is a gold mine glistening with potentially useful data.

Let’s take something all doctors would really like to know about their patients – what medications they are currently taking.

The patient sitting in front of you can’t remember and they’ve forgotten to bring their medicine kit.

How nice would it be if a computer could trawl through all the written medical records that exist about the patient and compile a list of likely recent medications?

This kind of intelligent system would never be out of work. It could find all the patients overdue for a pap smear or a vaccination. It could match patients with rare kinds of cancers to clinical trials, compile data on adverse events or create registries … the list is endless.

There’s just one snag; the language used in electronic medical records is deeply confusing to a computer.

Here’s a typical example of what a specialist might plug into their computer at the end of a consult (kindly supplied by Professor Wray Buntine from Monash University):

“Pt 18yof, independent. R) sarp pain++ @rest 0000hrs. AV called. x15 paracetamol? x2 oxycodone x5 valium and ?? number of lasix. pt @ TNH ED 1/7 ago with same present. Hx ideation. 3/12 56 called for assault. Anxious+++ CATT tream involved.”

What does that mean? I have no idea. A computer would have trouble figuring it out too.

A lot of this notation has been made up by the doctor, so it’s not like the computer can cross reference it with a standardised list of medical abbreviations.

Text mining is tricky at the best of times, but healthcare is one of the hardest areas to work in.

“I call it worst case,” says Professor Wray Buntine, a machine learning expert at Monash University in Melbourne.

“Just about anything else is better because there are huge amounts of jargon and abbreviations. There are hundreds of thousands of different, very specialised drug names, medical disease names. Several of them have multiple acronyms and common language versions.

“And also, there’s a lot of ambiguity. Some jargon has different meanings in different places. In the emergency ward it means one thing, in another ward it might mean something else.

“A lot of medical text isn’t grammatical, it’s in bullet points. You just have a stream of consciousness almost.”

It gets worse though. Doctors don’t always have the best spelling and their fingers slip just like the rest of us, stamping typos into the medical record.

The word ‘diarrhea’ might actually appear as diorhea, dihorrea, diaherea, dihorrea, dierrhea or diorrea in the medical record.

And there are as many typo combinations as there are colours in a rainbow. Atorvastatin, for example, could mutate into aotrvastatin, atorvastatin, atorvastatin, atrovastatin, taorvastaint, atovasattin, and on it goes…

Typos can easily lead to unintended meanings that mess with computers. Here’s an example helpfully provided by US informatician Associate Professor David Hanauer:

When a doctor types in “9 yr old fe,sle, strong family hx of food allergies”, a bot reading that line might assume the patient had a condition called SLE or Systemic Lupus Erythematosus.

But if you look at an English language QWERTY keyboard, it’s easy to see the mistake; the ‘,’ button is right next to the ‘m’ button and the ‘s’ is right next to the ‘a’. So, the record should read “9 yr old female”. A simple transposition created a medical condition!

Sometimes it isn’t even a typo that baffles the computer though. Does ‘CA’ mean calcium, cancer or California? The answer is: it depends on the context.

You’d think that numbers would be fairly straightforward, but they aren’t. Ten could be written as ‘10’ or ‘ten’ or ‘X’ (Roman numerals). And ‘10 years old’ could be written as ‘ten yo’, ‘10 years’, ‘10 yrs’, or ‘tenyo’, or numerous other combinations. Even worse is ‘4’ which can also be ‘IV’, which is itself a commonly used abbreviation for ‘intravenous’.

Doctors don’t speak the same lingo either. In a single day at a US hospital, doctors described ’fever’ in 278 different ways and ‘ear pain’ in 123 different ways in 213 patients, according to an analysis of digital medical records.

Medical language has a lot of redundancy built into it too. Cancer could appear as carcinoma, ca, tumour or neoplasm in a medical record, while ‘white blood cell count’ could read: white count, leukocyte count or WBC.

And the English language is filled with homonyms – words that are spelled the same but have two completely different meanings. A human could easily distinguish between “feeling blue” to “turning blue”, but a computer might not discern that there’s quite a big medical difference between the two phrases.

Necessarily messy

 This still isn’t even scratching the surface of the complexity, nuance and uncertainty that permeate medical records.

For example, a doctor might write in a diagnosis followed by ‘???’. To a human, three question marks is an alarming amount of uncertainty. But how does a computer translate that feeling of doubt in the language of 0s and 1s?

Not everything a doctor writes on a patient’s chart is gospel either. Incorrect diagnoses spread like viruses through medical records, getting replicated every time a new record is created. While a specialist would carefully weigh the evidence behind each diagnosis, computers might be completely thrown off course by these hidden landmines.

 We could just force doctors to write in a more standardised, structured format. That would make the computer’s life a LOT easier.

But structured data, such as box ticking, standardised forms or use of ICD-11 codes, actually contains substantially less information than free text. You lose a lot of data about the series of events and the medical reasoning that brought the doctor to a particular diagnosis or treatment plan.

It’s easier for AI to read standardised records but these kinds of records don’t capture the subtleties, uncertainties and broader context that help doctors and patients make good healthcare decisions.

And besides, entering structured data would be clunky and annoying for doctors, slowing down workflow.

In a pivotal paper on the topic, US biomedical informaticians concluded that there was a direct conflict between the desire for structured data and the need for flexibility in medical records.

The “narrative expressivity” of free text should not be sacrificed for the sake of structure and standardisation, they argued. Instead, EMRs should have a hybrid model where both types of data can be recorded.

Reaching for the moon

All these peculiarities of medicine make it incredibly hard for Natural Language Processing  systems to parse the text, says Professor Buntine.

What is “parsing”? I hear you ask. It is basically how computers read human language. The computer scans free text and classifies each word it encounters as a verb, noun, subject, object, drug name, disease name, and so on.

Natural language processing (NLP) is a branch of computer science that tries to extract meaning from free text using coding, artificial intelligence and linguistics.

That hasn’t stopped numerous research teams from having a crack at it.

There are some relatively simple forms of NLP, like when your smartphone translates your finger painting into text or when Google finishes your sentences.

Then there are slightly more ‘clever’ systems, such as search engines that retrieve information not just containing the key word you typed in, but also information containing synonyms or related concepts.

And then there’s the holy grail of NLP: an intelligent system that can crunch thousands of pages of free text data in order to help solve a complex medical problem, such as recommending the best treatment for a cancer patient. This is the kind of system that the group behind IBM Watson Oncology and other research teams are hoping to build.

While grand NLP projects in health are garnering a lot of interest, there’s a growing realisation that deciphering free text in the health sector is much more difficult than, say, teaching Watson to play Jeopardy!.

In fact, the failure of IBM Watson to live up to the hype in healthcare has been well documented.

This STAT investigation in 2017 found that Watson for Oncology was floundering. The system didn’t tell doctors anything they didn’t already know. The supercomputer was still in its “toddler stage”, one cancer specialist said. “Teaching a machine to read a record is a lot harder than anyone thought,” one ex-project leader admitted.

For all the above reasons, natural language processing “hasn’t really hit the mainstream for a lot of tasks for medical research”, says David Hanauer, an associate professor pediatrics at the University of Michigan who has specialised in health informatics.

There is some really low hanging fruit, however. Over the past 15 years, Associate Professor Hanauer has been building a basic search engine for doctor’s notes called EMERSE.

It’s a bit like doing a Google search. The system doesn’t do any complicated ‘thinking’, it just retrieves information and lets the doctor, or the researcher, figure out the rest.

It’s a free, open source system that is currently being used at the University of Michigan, the University of North Carolina and the University of Cincinnati, and there are a few other centres working on getting it installed.

“Amazingly, most medical record systems don’t really have a good way to help the clinicians find information,” says Associate Professor Hanauer.

What EMERSE does is allow the investigator to enter many different clinical terms, like diagnoses, symptoms and medications. Then it searches the medical record not just for those specific words, but for a range of synonyms, abbreviations, acronyms and shorthand descriptors.

EMERSE can help researchers find a cohort of patients across an entire system, or it can highlight certain terms within notes to speed up chart reviews.

“It will show them where all those things are mentioned in the notes so that they can drill down very quickly and find it and then do the data abstraction that they’re trying to do,” says Associate Professor Hanauer.

EMERSE could also work as a memory aid for doctors and patients during a consultation.

Let’s say a patient comes in with a headache and tells their doctor that a medication they took five years ago worked really well but they can’t remember the name of it.

“So, with a tool like this, you can basically just type in the word ‘headache’,” says Associate Professor Hanauer. “And then within a few seconds… ‘Oh, I see where that that term was mentioned… from that time a few years ago’.”

Custom built NLP

You’d expect there to be some collaboration between researchers working on NLP in health because of the advantages of pulling data. (AI systems get smarter when they are exposed to new and different clinical data sets.)

But actually, the opposite is happening in health. Instead of collaborative projects, we see lots of independent research groups building their own NLP systems from scratch.

Why? Because health data has a lot of privacy restrictions. While normal human language can be sourced through Wikipedia, the language of doctors is only found inside clinical notes, and research teams need special permission to work with these sensitive records.

It might seem like a duplication of effort for so many research groups to be trying to solve the same NLP problems independently. But medical records look and sound very different depending on which group of doctors made them, so custom built NLP might actually be superior to joint projects anyway.

Dr Rachel Knevel, a senior scientist at Leiden University Medical Centre in the Netherlands, is working on one such project. She’s interested in delving into the question of what triggers rheumatoid arthritis using cluster analysis.

To do this, it’s a lot faster to develop an NLP program that can quickly scan tens of thousands of patient charts, rather than try to read each chart individually, she says. “I think it’s a way to make science more efficient,” she says.

Dr Knevel’s team has been developing a system that can identify clinical records that mention drugs for rheumatoid arthritis. The major problem is that the record is full of typos and spelling mistakes, she says.

Methotrexate is often written into the clinical record using the abbreviation ‘MXT’. But sometimes there’s a typo, and the word becomes ‘MTX’. Sometimes it’s ‘TXM’ or ‘NTX’.

The more letters that switch place or are replaced by another letter, the less likely it is that the doctor intended to write ‘MXT’.

Identifying the likely typos for methotrexate is essentially a maths problem, and the Leiden team solved it using something called the “Damerau Levenshtein distance”. This is an algorithm that measures the distance between words in terms of the number of character operations (remove, add, move or replace) required to transform one word into another.

They trained their algorithm on around 6,000 EMR entries and validated it using around 3,000 EMR entries. The algorithm demonstrated a high accuracy of detecting EMRs in which rheumatoid arthritis drugs (and typo versions) were prescribed.

Another example of an independent NLP project is happening in Geneva, Switzerland.

Patrick Ruch from the Swiss Institute of Bioinformatics’s text mining group is leading several NLP projects to support personalised medicine in oncology.

Dr Ruch says his team has designed an algorithm that distils the “large universe of papers” into a list ranked in order of importance, with the most robust research on common mutations at the top and the more personalised, but less robust studies on very specific variants second.

It’s a similar system to commercial products like IBM Watson, says Dr Ruch. The major difference is that the Swiss system has a closed loop with the oncologists, so if a third line treatment was recommended by the computer system, the research team knows within a few weeks whether this was helpful or not in real life.

“I cannot claim that we have saved someone,” says Dr Ruch. “It is too early.”

Australian made NLP

CSIRO has a Data61 team scattered across Sydney and Brisbane who work on NLP problems in healthcare.

They’ve got three main projects underway, according to senior research scientist Dr Sarvnaz Karimi:

  1. In a project that started around five years ago, CSIRO’s Data61 team used data from com to identify what drug adverse effects were being experienced in the community.

Anyone can contribute to the US website askapatient.com. It publishes reviews of medications by patients, including lots of rich data about potential side effects.

The Data61 system identified whether people were reporting their own symptoms or a friend’s, and whether there was language of negation (i.e. ‘I didn’t get a headache’).

The system then translated the informal language used by patients into standardised medical language, so that statistics could be created from the free text data.

  1. The Data61 team also created a system that could pick up the early rumblings of thunderstorm asthma over Twitter.

By reading tweets relating to asthma and breathing problems, their algorithms would have detected the 2016 thunderstorm asthma crisis in Melbourne five hours before the first news report, according to a study published in Epidemiology this year.

  1. The third NLP project by Data61 deals with a big problem in medical research – the difficulty of matching patients to particular clinical trials.

Clinical trials often don’t have enough participants because it is difficult for patients to figure out which trials they are eligible for.

To enroll in a clinical trial for cancer, for example, a patient might need to have a particular type of genetic mutation and no comorbidities.

There are databases like clinicaltrials.gov in the US but these aren’t easily searchable. Patients and doctors still have to read through each trial and figure out if they fit the criteria.

To solve this problem, the Data61 team is working on an NLP program where a patient can type in their specific characteristics (such as their age and gene mutation) and the algorithm brings up all the trials that contain free text descriptions that match.

The reason why NLP is needed is because there are many different ways to describe gene names and age brackets, so a simple word search is insufficient.

Layers of difficulty

Professor Jon Patrick left the University of Sydney around seven years ago to start two NLP companies in healthcare: HLA Global and iCIMS.

“We’re past the startup valley of financial death,” he jokes during our interview at his Eveleigh-based office in Sydney.

Professor Patrick is working closely with the Sydney Adventist Hospital to create cancer patient records for multidisciplinary care team meetings.

The system draws in reports from surgeons, chemotherapists, radiotherapist, and, as an “extra twist”, the hospital has asked the company to design an NLP program that pulls data from pathology reports into a structured summary, says Professor Patrick.

“We’ve got lung and breast going at the moment, and we’ve just started to work on urology and gynaecology,” he says.

But Professor Patrick’s major client is from the US – the California Cancer Registry.

The registry mines medical records to generate data on cancer trends. To speed up this work, they’ve hired Professor Patrick’s company to create NLP systems that can help sift through the thousands of clinical reports.

There are several layers of difficulty involved in teaching a computer to accurately read these reports, says Professor Patrick.

If you take a pathology report, for example, that classically has a macro description, micro description, final diagnosis, description of specimen and clinical history, but it can also have supplementary records with biomarkers and genomic tests, he says.

“So, that diversity just makes it richer and trickier to get what you want,” says Professor Patrick.

“And if you think about pathology, again, the core information about the diagnosis should be in the final diagnosis section. ‘Should’ be. But all too often it’s not.”

Even when an NLP system appears to be working fine, it can get thrown every time a new source of data is added – like as a new pathology provider that structures their reports slightly differently, he says. The company often does NLP program revisions on a weekly basis to keep on top of these changes.

Sometimes the NLP system can identify a single doctor whose eccentric sentence structure or word choice is continually causing the computer headaches.

“We certainly see reports that cause trouble where we can instantly identify who the author is,” says Professor Patrick.

It seems like some electronic medical records are about as indecipherable to a computer as a handwritten doctor’s note is to the average human.