The idea of creating a shared online repository that would make all data from publicly funded research available for anyone to investigate and use, sounds like a laudable and ambitious plan. But how exactly would a European open science cloud (EOSC) work in practice? On 28 and 29 November, data experts, policymakers and scientists gathered in Brussels, Belgium, to discuss the way forward. Horizon went along and here are nine things we learned.
So said the EOSC’s lead architect, Donatella Castelli of the National Research Council of Italy (CNR-ISTI). She was speaking at the opening session of the event organised by the EOSCpilot project, which has been set up to support the development of the EOSC. By using digital technologies and new collaborative tools to share data and services, science can become truly open, not only done by professionals but also by amateurs – the so-called citizen scientists. However, changing the way science is done also means changing the legal and policy context in which it operates. One key ingredient of open science is data, which is currently ruled by domestic law, creating a lot of complexity for open science experimenters. The other key ingredient is technology, which means creating rules for the interchange of data and resources across borders and between researchers.
We may think that we are swamped by data now, but there is a lot more data to come. While our phones and laptops are measured in megabytes and gigabytes, big data is petabyte-scale. A petabyte is 1024 terabytes or more than a quadrillion bytes, and the European fusion reactor project at ITER will soon be generating 2 petabytes of scientific data per day in the drive to create a new, sustainable power source. The challenge for open data scientists is to not only store so much raw data but also find a way of extracting the nuggets of valuable information. More data across multiple domains means an enormous task ahead for a project that aims to link it all together.
The EOSC is currently in a pilot phase. By creating science demonstrators - mini open science clouds in specific disciplines such as Earth sciences and high-energy physics - and developing experiments, all kinds of issues come to the surface that can then feed into the design phase. It’s 'a bit like a requirements study,' said the EOSCpilot’s leader and Horizon interviewee Dr Juan Bicarregui. The science demonstration projects hint at the potential of open science and highlight the key technical, organisational and policy challenges of building an international, community-wide cloud network to share resources. It's an example of the evolutionary nature of the development of the EOSC, which has been described as a 'learning by doing' exercise.
One of the science demonstrators, called Pan-Cancer Analysis, is allowing scientists to re-analyse existing Dutch population data to find links that have been hidden to date and uncover new insights into cancer. In Italy, a project is using cloud technology to digitalise old texts, index the contents and make it available to other researchers to work on. Other areas covered by the demonstrators, and on show at the conference, include high energy physics, astronomy and Earth sciences. Because each data set is unique and proprietary, each demonstrator requires tailored computer science to enable the experiments to be cloud-ready.
Sharing is FAIR-ing. FAIR data is findable, accessible, interoperable and re-usable, and it guides the handling of sensitive data and the development of metadata. Metadata is the construction glue that holds the data together and to be FAIR, data must be tagged, managed, filed and connected in a consistent manner, now and into the future. The FAIR principle acts as a compass for getting to a working EOSC but it’s a huge challenge. To illustrate the scale of the problem, just one domain (seismology) conducting open science for better earthquake prediction creates a billion pieces of metadata per year - how is this to be managed and made FAIR?
It is a misunderstanding to think that all of the data goes into a big pot in the middle - it doesn’t. Ownership always stays at the local level so the EOSC will not own any data. As well as joining up different data infrastructures into one big network and plugging any gaps, the EOSC’s role is to help researchers by providing services. These include large scale computer processing power and a set of rules underpinning open data. The question many delegates had, is how much procedural involvement should the EOSC have? Most people thought a light-touch approach was the best way to ensure scientific engagement and progress.
The EU’s General Data Protection Regulation (GPDR) is coming in 2018 and it will change the way data is handled in Europe. It will enable a smoother, unified data handling regime in the EU, while giving people back control of their personal data. The new regime governs data protection, data collection and data use, and any company, regardless of location, which wants to trade in the EU will be subject to the law. Companies need to be compliant with the new regulation when it comes into force after a two-year transition period in May 2018, so with 11 chapters and 91 articles, the GPDR has many IT managers preoccupied at present. For the EOSC, it means dealing with one piece of legislation rather than 28, a big advantage for a pan-European project.
If you’ve ever been locked out of your social media accounts or email, you know how frustrating that experience can be. Something as basic as a failed login attempt means many researchers are discouraged at the first hurdle and lack trust in the system. Human factors such as ease-of-use, added value and, critically, identity management so that logging in is as frictionless as possible, are part of the goal of making EOSC human-centric, as Silvana Muscella, chair of the High Level Expert Group that oversees the whole initiative, urged in her closing keynote address. Trust and familiarity are cornerstones of engagement and while it is unlikely to be as easy to use as Facebook, the user experience should not be a barrier to adoption.
The EU would like to see the European Open Science Cloud become a reality by 2020. In total, around €272m of the Horizon 2020 budget for 2018-2020 will go towards open science. So far, 70 scientific institutions have endorsed the EOSC Declaration about that goal.
If you liked this article, please consider sharing it on social media.
Leuven, Belgium, has been named the 2020 European Capital of Innovation for its use of innovation to improve residents’ lives.
Five ‘mission reports’ outlining ambitious 10-year plans to tackle some of the major challenges faced by Europe were officially handed over to EU Commissioner Mariya Gabriel on 22 September at the opening session of this year’s European Research and Innovation Days.
Recent advances are bringing cancer vaccines much closer to reality, giving patients another weapon in their arsenal of cancer treatments, according to Dr Madiha Derouazi, CEO of Amal Therapeutics and one of three winners of the 2020 EU Prize for Women Innovators.
Over one hundred years since psychiatrist Alois Alzheimer described the hallmarks of the disease that bears his name, significant gaps in our understanding of how and why the disease develops mean that we still do not have effective treatments.
The Belgian city won the €1 million iCapital cash prize, while runner-up prizes went to Cluj-Napoca, Espoo, Helsingborg, Vienna and Valencia.
Dr Madiha Derouazi has won a 2020 EU Women Innovators award.
Dr Kate Rychert studies ocean plate structures.