An algorithm that detects social bot activity on Twitter in real-time could prevent the spread of misinformation and make it easier for first responders to detect major events according to a Research Scientist at CSIRO’s Data61.
The algorithm uses machine learning, artificial intelligence (AI) and natural language processing (NLP) to distinguish between genuine conversations and bot-generated messages, creating a set of ‘factual’ parameters and rapid filtering system for real-time results.
Developed by Data61’s Dr Mehwish Nasim in collaboration with Dr Jonathan Tuke, Dr Lewis Mitchell, Prof Nigel Bean and Andrew Nguyen from University of Adelaide, the original concept was to solely detect major events, however, after finding much of the research data polluted by automated posts, the need to remove these users arose.
“We looked at different ways bots were posting content on Twitter and compared them to how a normal user behaves,” explains Dr Nasim. “For example, legitimate users often tweet about different topics, use a mix of hashtags, post URLs to a variety of pages, tag other users, and have a predictable posting frequency.”
“Social bots, on the other hand, have a high tweeting frequency, the messages would be low in topic diversity and regularly include the same URLs and hashtags.”
The tool differs from other models in two key ways; it does not require complete access to a user’s profile (a method that exploits user data privacy), and it streams up-to-the-second information. Whereas, other models scrape user information and there’s a delay in analysing the data.
The algorithm uses historical and current individual tweet content data to determine the poster’s intention, a method that also significantly streamlines the information gathering process, providing timely, scalable and cost-effective identification and sorting solutions.
By rapidly filtering out pollution posts in real-time, accurate predictions about upcoming crises or events can be made and provided to service and emergency responders. It can also prevent the spread of false narratives.
During the recent bushfires, the hashtag #ArsonEmergency flooded Twitter feeds, however, upon analysing Twitter post data, Dr Nasim discovered there were a number of bots and trolls promoting the hashtag to misguide the public.
The image above shows that bots were using content to spread misinformation from the URLs listed in the image above with the lower Gini scores (e.g. digitaltrends.com). A Gini score is a measure which looks at the relative amount of inequality in a set of data.
“There was a lot of polarisation around this topic,” says Dr Nasim. “People who were already climate-change deniers were tweeting about arson emergency and created an echo-chamber where the spread of this narrative was reinforcing their existing beliefs.”
Dr Nasim and her colleagues are currently working on a paper exploring and analysing the spread of misinformation on Twitter surrounding Australia’s Black Summer Event, with Dr Nasim commenting that a mechanism to detect false information in real time is needed now more than ever.
“People should be able to make decision-based on facts, and as scientists, we are faced with the challenge to separate misinformation from facts.”
Misinformation and its spread are a major concern for governments, democracy and society, with its popularity as a tool of influence continuing to grow since its worldwide inception in 2016. It is essential that people have access to factual and truthful information to formulate accurate decisions.
As we move closer towards the 2020 presidential election and enter a critical stage of climate change action, this algorithm will be an essential tool in not only combatting the spread of false news, but in identifying crises, such as outbreaks of illness and civil unrest, and the services and responses required to successfully manage them.