When the thunderclouds blew pollen over Melbourne on 21 November 2016, causing a 10-fold increase in hospital presentations for asthma, it took some time for the authorities to recognise that they had a city-wide emergency on their hands.
But hours before the hospital deluge, people were reporting symptoms of wheezing, coughing and breathing difficulties on their public Twitter accounts.
Researchers at CSIRO’s Data61 are now using these types of tweets to demonstrate how social media might form part of an early-warning system for such rapidly developing asthma outbreaks.
The team developed 18 algorithms to analyse the tweets overall. Three of the algorithms would have detected the thunderstorm asthma outbreak up to nine hours before the first official report.
“We do not expect that social media alone will be useful for epidemic intelligence,” said lead researcher Dr Aditya Joshi, a computational linguist and postdoctoral fellow at CSIRO’s Data61.
“Traditional forms of syndromic surveillance or epidemic intelligence are extremely valuable. However, this work shows that social media (which tends to be a real-time source of information) can be a useful and viable alternative, especially with respect to events, like the thunderstorms, which require a quick response.”
The research was completed with the assistance of Raina MacIntyre, a professor of global biosecurity at the Kirby Institute at UNSW, and Dr Cecile Paris, the chief scientist of Data61, along with other collaborators at that company.
While tweets were a useful source of data, the task of building an alert system using only tweets was a complicated one, said Dr Joshi.
The first big problem is that of false alarms. No one wants an alert system that cries wolf all the time.
To solve this “alert swamping” issue, the researchers narrowed the dataset to the first health-related tweet from a unique user in a day, geo-restricted to Melbourne.
Another difficulty with tweets was that people often used health words as a figure of speech, Dr Joshi said.
For example, someone might tweet: “I saw the new trailer of this film. Oh my god, it’s awesome. I can’t breathe.” This is clearly not a description of physical symptoms.
The researchers separated these figurative tweets using a pre-trained algorithm that uses vectors to tell how closely connected two words are in terms of their meaning.
This algorithm, by GloVe, is publicly available and has been trained to recognise the connections between 27 billion words.
Each word is represented as a point in 200-dimensional space, and the algorithm calculates the connectivity between two words by averaging out the vectors.
For the tweet, “I saw the new trailer of this film. Oh my god, it’s awesome. I can’t breathe,” the algorithm would detect that the words “trailer” and “film” were far away in meaning from a cluster of health-related words.
By comparison, the tweet: “There’s smoke today, and I can’t breathe”, contains content words that are mostly health-related so this would be classified as a personal health report by the algorithm.
“[However], there are lots of things that might not work about this approach,” Dr Joshi said.
Firstly, there are many symptoms that people would feel uncomfortable or embarrassed to share on social media, limiting the types of diseases that a tweet-based alert system could track.
Secondly, social anxiety tended to increase the number of fear-related tweets about health, which could throw out the AI, he said.
“So, for example, the tweet ‘I have a rash on my hand. Oh my god, do I have measles?’ Now, the person is reporting a rash, but this is not really a report or a confirmation of measles,” he said.