In today’s digital age, it can feel as though we are drowning in a deluge of data, and the scientific field is no different. According to a 2014 study, one paper is published every 30 seconds, and more than 70 000 papers have been published on a single protein, a tumour suppressor called p53.
Given the challenge of manually keeping track of such a volume of information, let alone making use of it, it is unsurprising that a separate study found that 90 % of scientific papers are never cited, and only half are ever read by anyone except the authors, referees and journal editors.
Enter text and data mining (TDM). This is a technique that uses intelligent software to sift through large quantities of material, pull out the data and analyse it for patterns, the idea being that it can help scientists identify the plethora of connections and contradictions that would be otherwise impossible to uncover.
‘Text and data mining is about extracting hidden knowledge from text, from all this volume of text that is laying around that no one is able to read,’ said Natalia Manola, a researcher at the Athena Research and Innovation Centre, Greece. ‘We are able to connect and infer knowledge that even an expert might not be able to see.’
Manola coordinates the EU-funded OpenMinTeD project, which is building a registry of TDM services and tools so that researchers are able to find an appropriate piece of software for their purposes and use it easily. Joining up TDM technologies with potential users in this way also has benefits for the designers of the software, who need access to research data to test and hone their algorithms.
‘There is a world of scientists and researchers who want to use text and data mining services but they don’t know how,’ said Manola. ‘And then there is the other side, people who are able to produce these tools and services, but somehow these have stayed within their labs, without a broad uptake from scientists, industry and the public. (We’re) trying to bridge this gap.’
Dr Peter Murray-Rust is director of ContentMine, a not-for-profit organisation which has developed software that enables researchers to search through scientific papers on a particular subject. He gives the example of the Zika outbreak as an area where TDM can help to enhance knowledge.
‘We’re going to need to know a lot more about Zika, and much of it may already be in the scientific literature that’s been published but that we don’t read. We don’t read it because there’s so much, so we’ve built a machine, ContentMine, that will liberate the facts from the literature.’
‘We are able to connect and infer knowledge that even an expert might not be able to see.’
Natalia Manola, Athena Research and Innovation Centre, Greece
However, while TDM has been billed as the research method of the future, there is some indication that Europe is currently lagging behind its global counterparts in using the technology.
Marco Caspers from the Institute for Information Law in the Netherlands is working on the EU’s FutureTDM project, which is trying to identify what is preventing people from using text and data mining more.
‘There is some empirical evidence that shows that scientific output using TDM technologies is significantly less apparent in Europe than, for example, in the US,’ he said. ‘We are looking at what the cause of this could be.’
Caspers says that the danger of Europe’s underactivity is that it risks driving away innovative companies that want to develop or use TDM technologies.
‘Companies that are starting to explore this field will move out of the EU because they will have a better climate – maybe economically, maybe legally, maybe otherwise. They would be leaving the EU, which would affect the growth of the economy – because it is a growing sector.’
Some of the challenges the FutureTDM project is examining include how to set up the legal framework so people can use TDM technologies without worrying about violating data protection and privacy laws, whether data can or should come in a standard format, and how to ensure the quality of the data that’s retrieved.
One of the issues Caspers is looking at closely is copyright. Because TDM techniques may involve copying protected content, it may infringe copyright laws, even when a researcher has the right to access that content.
He says that while the EU has a rule that allows reproductions to be made for scientific research with non-commercial purposes, it is not mandatory and very few Member States have implemented it in a way that allows text and data mining.
‘In many countries, TDM researchers do not even know if it would be legal to do any TDM,’ said Caspers. ‘It also affects cross-border collaborations (because) they are not sure if it would be lawful.’
To resolve this, the European Commission has proposed a copyright exception, meaning that European researchers and some innovators should have the explicit right to process on a large scale the content to which they have legal access.
The aim is to create legal clarity and make it easy for researchers to access content for TDM purposes without having to invest time and money in negotiating complex licenses. It would also mean that the copyright situation is the same in all EU countries.
Currently the proposal covers public or private organisations that are carrying out scientific research in the public interest. However, many researchers would like to see the exception extended to companies, such as small- and medium-sized enterprises (SMEs), which are important not only for developing the technology to perform text and data mining activities, but also to use these tools to innovate.
‘We are aiming for a digital single market. If we’re not allowing TDM for SMEs we break the bridge between open science and open innovation,’ said Natalia Manola. ‘On one hand we are advertising and we want to attract SMEs, but how are they going to come to this?’
Despite advances in both virtual and augmented reality technology in the last few years, there’s one area that remains neglected: touch. With your VR headset on, you might be able to explore the sights of a vast forest and hear birdsong all around you, but you won’t feel the dampness of the moss on a tree trunk or the squelch of leaves underfoot.
As our world becomes more digitalised and connected, we can actually make a virtual copy of it. And such replicas are now being used to improve real world scenarios, from making aircraft production more accurate to preventing oil spills.
When the worst floods since 1966 submerged the city of Venice in November 2019, the blame was laid on its incomplete mobile flood gates. They have been under construction since 2003 but were not ready in time to save the Italian city. But elsewhere in the Venetian lagoon, there was a different story to tell. In 1966, the coastal sides of Lido and Pellestrina islands also flooded, but this time they didn’t.
Sea ice researcher Dr Polona Itkin of UiT The Arctic University of Norway in Tromsø is currently aboard a research vessel spending one year trapped in Arctic sea ice to study climate change up close. On 20 January she spoke to Horizon from the ship, Polarstern, about working through the polar night, the shortcomings of satellite data and fending off polar bears.
Problems have dogged its floodgates but other measures have been successful.
Sea ice researcher Dr Polona Itkin spoke to Horizon about life aboard a research vessel drifting in the Arctic.