Directorate-General for Research & Innovation logo Horizon: the EU Research & Innovation magazine | European Commission logo
Receive our editor’s picks

Computers learning to read, watch and understand

Open data repositories such as DBpedia are helping software to understand what it reads on a web page. Image: Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak.

Computers are being taught to understand the meaning behind words and images on the internet, and it’s bringing online a new generation of intelligent software that can perform tasks that only humans were able to do up to now.

It’s thanks to machine learning, artificial intelligence and an emerging branch of computer science called the semantic web.

‘Semantic technologies can play a very important role to find a smarter way to get high-level information that is currently retrieved by human operators,’ said Andrea Ciapetti, open source specialist at Italian technology firm Innovation Engineering.

He is working with the Madrid police to create a search engine that can analyse video footage and discover criminal acts such as someone being pickpocketed.

Software identifies elements of the video which might be of interest, then semantic web technology looks at these events and picks out the ones that might indicate a crime is taking place.

His company is also working on the EU-funded DISCOVER-IT project to create a semantic search tool for start-up companies that can scour the web for relevant patents and data from open access research papers in order to help them come up with innovative ideas.

Semantic technology works by annotating words and images with supplementary information so that software can understand their meaning.

‘This is what the semantic web is about, turning the web of documents as it is today, as it is for humans, into a web of data that is for software consumption,’ said Luca De Santis, from Net7, an Italy-based web technology firm.

He is the project manager of StoM, an EU-funded project which is working out how to commercialise two semantic search engines developed as part of an earlier project, SemLib.

One of the products, called EventPlace, is a search tool that brings together information relating to an event, while the other, PunditBrain, is used to create annotations on web documents that, thanks to semantic technologies, are easier to search and reuse.

‘This is what the semantic web is about, turning the web of documents as it is today, as it is for humans, into a web of data that is for software consumption.’

Luca De Santis, Net7, Italy

Semantic search is able to link similar ideas together in this way because it adds explanatory information to web pages, or links to external repositories which give meaning and context to words.

Wikipedia for computers

Data repositories such as DBpedia, a version of Wikipedia for computers, are at the heart of semantic technology. These can be used to annotate web pages, making them easier for semantic systems to understand.

It means that when the software comes across a word which could mean two things, such as ‘rock’, which could refer to music or geology, the software can check to find out the exact meaning.

‘I can provide a link to the DBpedia entity, and say, “Ok, this is about music”,’ said De Santis.

Semantic technologies are already being used to group news articles together when they are about the same thing, or to understand what a Facebook user is interested in by looking at the similarities between pages they have ‘liked’.

‘Facebook, through semantic web technology, can understand what you really like,’ said De Santis. ‘Is the page about restaurants, or is it about rock music?’

Mathematical certainty

One of the problems with using semantic techniques is that, in many areas, meaning can be difficult to define in a mathematically precise way, creating data that is ill-suited to the logical reasoning used by computers.

One example is wine, where words that are used to describe taste, such as sweet or fruity, can mean slightly different things to different people. Yet for a semantic technology search to be able to answer questions, such as which wine goes well with a specific dish, it needs to understand and use these different terms.

‘Building a logical theory for these real-world domains is quite tricky,’ explained Dr Steven Schockaert, principal investigator of the FLEXILOG project.

He is working on a way to spatially model the meaning of words, so that it can be used to answer logical search queries.

‘The goal would be to have a system that can just learn, on its own, information about many different domains as it’s reading information on the web,’ said Dr Schockaert, whose work has been funded by the EU’s European Research Council.

The plan is to feed the system information that it can use to learn.

‘Initially we’re going to work on Wikipedia, then we’re going to scale it up to a substantial fragment of the web.’ 

More info