Directorate-General for Research & Innovation logo Horizon: the EU Research & Innovation magazine | European Commission logo
Receive our editor’s picks

Social media lie detector separates online fact from fiction

New tools are being designed to aggregate, correlate, and link together online information. Image: Shutterstock/ zamzawawi isa
New tools are being designed to aggregate, correlate, and link together online information. Image: Shutterstock/ zamzawawi isa

Social media is increasingly being used as a source of news, but the problem is that you can’t always trust what you read online. Now, EU researchers are tackling this issue head-on by creating software to help people decide whether they can rely on information found on Facebook or Twitter. 

‘During the London riots (in 2011), there were seven (online) rumours which were analysed in more depth by journalists and social scientists,’ said Kalina Bontcheva, a senior researcher in computer science at the University of Sheffield, UK. ‘Out of those one was found to be true, one was unverified, and the other five were found to be false.’

The immediate danger of such rumours is that they can lead to wrong decisions for those monitoring social media in a professional context, such as the emergency services or media.

Bontcheva leads the EU-funded PHEME project, the aim of which is to build algorithms to assess the veracity, or truthfulness, of stories that appear on social media. These algorithms will feed into a system that can analyse, in real time, whether an online rumour is likely to be true.

‘We don’t want to have a system say “this is true” or “this is false”,’ said Bontcheva. ‘We can present the person using the system with a certain level of confidence – we think this is likely to be true or we think this is likely to be false – but we also want to give them all the evidence so they can make up their minds.’

To build their algorithms the PHEME researchers are investigating how online rumours behave in terms of where they come from, who spreads them, how fast they spread, and whether there are any denials, supporting statements or additional evidence. They have already devised a way of continuously collecting all the online information related to a particular topic or event, and the next step is to automatically analyse this information to spot emerging rumours around these.

One of the aims of the PHEME system is to help journalists spot emerging stories that may be related to a bigger topic. In order to design algorithms that can recognise stories of interest they are being guided by input from a group of journalists from swissinfo, who are partners on the project.

‘The information is already out there, it’s on Twitter, it’s in the blog posts, it’s there. The challenge becomes to aggregate it, to correlate it, and to link it together.’

Kalina Bontcheva, University of Sheffield, UK

During the recent social unrest that followed the shooting of Michael Brown in Ferguson, US, the journalists used the PHEME system to track the event on social media through hashtags and keywords such as #Ferguson and #MikeBrown. They then identified a number of emerging stories that were coming out of social media and could be of interest to journalists, such as reports that media were not being allowed to enter Ferguson. The journalists then fed this information back into the PHEME system to help refine its algorithms.

Bontcheva says that once the system is able to spot emerging stories and rumours automatically, it will be a step ahead of existing fact-checking websites. Current websites tend to limit themselves to analysing the truthfulness of an existing story by collecting evidence from users.

‘The idea is to try and move away from manual crowdsourcing,’ she said. ‘In a sense the information is already out there, it’s on Twitter, it’s in the blog posts, it’s there. The challenge becomes to aggregate it, to correlate it, and to link it together.’

Medical controversies

The PHEME software is also being designed to spot potential medical controversies that are brewing online, such as the debate around vaccinating children. The idea is to create a tool to help the medical community identify and respond to genuine public concerns, such as on the side effects of certain medicines, but also to counter any misinformation that is being spread.

The system will help assess credibility by analysing who is contributing to either side of the debate. ‘We try to identify the key sources of these messages and tag those sources with their reliability and trustworthiness indicators, for example, have they spread similar rumours in the past or not?’ said Bontcheva.

‘This is potentially very relevant to the medical use case because there we do know that certain websites or certain Twitter accounts tend to follow a certain agenda.’

Reliability of online sources is a question that is also under investigation by researchers on the EU-funded Reveal project, who are likewise designing web software that will allow journalists to ascertain the truthfulness of online stories.

Dr Christos Georgousopoulos of INTRASOFT International, coordinator of Reveal, says their software analyses three aspects of a social media post to determine its veracity: the contributor, the content and the context.

‘For the contributor the idea is to see what we can find out about a particular source of information that is posted by someone,’ he said. ‘It will have different scores for reputation, history, popularity, influence and presence.’

‘For content, there are approaches for instance to check if an image is original or not, if it was altered. For context, we try and see if the what, when, and where contexts are given.'

The information on all three aspects is combined to give an indication of how likely a post is to be true.

Truthfulness ratings

The researchers are building a prototype system that gives the journalist an overview of how likely a story is to be true by rating the different aspects. For example, a source may score five out of 10 for popularity, seven out of 10 for influence and nine out of 10 for reputation. However, it remains up to the journalist to determine if they think information from that source is credible.

‘The system will try to give you as much information as possible for you to judge but at the end you will be the one to say yes or no,’ said Dr Georgousopoulos.

The end goal is a commercial product that provides a variety of different services, such as collecting all the relevant posts in an area, checking whether content has been altered, and analysing the context in which the posts have appeared. Dr Georgousopoulos says that combining all these services in a single environment sets Reveal apart from current tools on the market, which tend to be limited to a single aspect of verification.

The main challenges facing researchers is the legal environment. ‘For instance, according to the European legislative framework, which is currently under reform, it is not clear how you can legitimately combine information from different profiles of users,’ said Dr Georgousopoulos.

Although both PHEME and Reveal intend for their systems to be used by specialist audiences, Bontcheva hopes that research in this field could one day play a bigger role in society by focusing attention on what people are talking about online.

‘I would like to have a system where we can capture what society is thinking about a particular event or controversy or political issue. During the elections, can we give people more of a voice, can we engage them better, can we improve the political process by giving them tools like this?’

More info