Directorate-General for Research & Innovation logo Horizon: the EU Research & Innovation magazine | European Commission logo
Receive our editor’s picks

It’s in the genes – data storage turns to DNA

Nick Goldman is a scientist at the European Bioinformatics Institute at Cambridge in the UK © EMBL
Nick Goldman is a scientist at the European Bioinformatics Institute at Cambridge in the UK © EMBL

‘It’s in your genes!’ How often have you been reminded by friends or relatives that you look the way you do because of the genetic code stored in your DNA? But next time you hear this expression used, you might stop to wonder what else could be stored in those genes.

According to the latest research to come out of the Cambridge-based European Bioinformatics Institute (EBI), DNA is capable of more than just storing genetic information alone: it also has the potential to store massive volumes of man-made data.

The research is now getting EU funding that could go towards refining the technique so that it could be scaled up to store all of the data that exists on Earth – estimated to be three zettabytes, or 3 000 billion billion bytes – which, for those who don’t think in ‘bytes’, is roughly equivalent to a pile of 750 billion DVDs.

In the future, a cup of DNA could store 100 million hours of video.

Storing information in a miniscule form that cuts down on space and does away with the need for energy guzzling and costly hard disks would be a timely innovation in the digital age. As more and more data is generated, the need for economical and durable forms of data storage also rises.

It was this pressing issue that prompted the key authors of the EBI research project, Nick Goldman and EBI Associate Director Ewan Birney, to act.

‘At the Institute, we share biological data with other scientists to improve their insights into life,’ said Goldman. ‘We add value to it and send it back into the research community via the Internet. But we realised that, as the volume of biological data we receive grows exponentially, our budget to handle and store it does not. Disks are expensive. We needed to find a way of storing large volumes of data in a small space, cheaply – and ensure that it could be retrieved efficiently.’

The pair hit upon their approach to resolving the problem three years ago. ‘Ewan and I were chatting one evening after a work conference in Hamburg. We were joking about, thrashing out ideas for alternative data storage methods,’ said Goldman. ‘And then, after we’d batted a few ideas back and forth, we just turned to each other and said, “How about using DNA?”’

Much of the funding for such research at the non-profit EBI comes from the European Union, under the Directorate-General Research & Innovation’s Sixth and Seventh Framework Programmes. In 2012, the Institute received EUR 7.3 million from the European Commission.

Before they started, Goldman and Birney put together a project research team at the EBI, which forms part of the EU-wide European Molecular Biology Laboratory (EMBL). They also enlisted another actor – Agilent Technologies, a California-based biomedical technologies company with expertise in writing DNA – to complete the research network. ‘Agilent saw it as a challenge and a fun piece of research,’ says Goldman. ‘They provided the required DNA samples to us for free.’

‘We already know that DNA is a robust way to store information because we can extract it from bones of woolly mammoths, which date back tens of thousands of years, and still make sense of it.'

Nick Goldman, European Bioinformatics Institute, Cambridge, UK

Shall I compare thee to a DNA?

 ‘We already know that DNA is a robust way to store information because we can extract it from bones of woolly mammoths, which date back tens of thousands of years, and still make sense of it. It is also incredibly small, dense and does not need any power for storage, so shipping and keeping it is easy,’ Goldman said.

The experiment to see if they could actually use DNA to store information took place in three stages:

1. First up were the EBI team. ‘Our role was to invent a DNA code into which digital information could be translated,’ said Goldman.

Typically, a file on a computer hard disk is stored in binary code, comprising zeros and ones. The computer ‘knows’ the rules of the code and translates the information it receives accordingly. It was up to the EBI team to rewrite the binary code into a DNA sequence on a computer file.

The coding system of DNA – or deoxyribonucleic acid – is built on four nitrogen bases, identified by the letters A (adenine), C (cytosine), G (guanine) and T (thymine). The trick was to write a DNA sequence where the same letters were never repeated. One way of decreasing the risk of errors was to write only short strings of DNA.

‘We figured, let’s break up the code into lots of overlapping fragments going in both directions, with indexing information showing where each fragment belongs in the overall code, and make a coding scheme that doesn’t allow repeats. That way, you would have to have the same error on four different fragments for it to fail – and that would be very rare,’ Birney said.

2. Once they had their DNA sequence design in place, they used it to encode an MP3 clip of Martin Luther King’s famous ‘I have a dream’ speech, a photo of the EMBL-EBI lab, an image of the famous DNA double helix structure as identified by James Watson and Francis Crick in 1953, and a text file of all 154 of Shakespeare’s sonnets.

The encoded computer files were flown to Dr Emily Leproust of Agilent Technologies in California. ‘We downloaded the files from the web and used them to synthesise hundreds of thousands of pieces of DNA. The result looks like a tiny speck of dust,’ Leproust said.

During the synthesis process, Agilent manufactured DNA that matched the DNA sequence sent to them by the EBI. Using technology that is a bit like an inkjet printer, they fired the encoded DNA in the form of miniscule droplets onto a microscope’s glass slide. The fluid was then freeze dried and the resulting speck of dust containing 739 kilobytes of data was flown back to Cambridge.

3. Reconstituted in water, the substance was shipped on to the EMBL’s Heidelberg office in Germany, where it was read back by sequencing machinery and the digital information reconstructed with 100 percent accuracy, the researchers said.

The EBI exists in large part thanks to funds received from the EMBL’s 20 member states, but Goldman sees EU funding as playing a vital indirect role in expanding its work. ‘In this research project, for example, we really benefited from being able to call on team members whose skills had been honed on schemes funded by the EU and who could assist in data analysis and data modelling. Sometimes, of course, the EBI gains essential hardware through funding, but here it was the EU’s “investment in people” that counted for us.’

More of a long-term thing

So, do the results of their research mean the end of the hard disk? Not quite yet. At the moment, the team sees its main application as storing information that needs to be archived for a long period of time and accessed on an infrequent basis.

‘From a cost point of view, DNA data storage really comes in to its own over the long term,’ says Goldman. ‘The one-off cost for DNA sequencing is still very high. But once that expenditure has been made, it becomes a very cheap way of archiving information. With DNA, maintenance costs are minimal as the cost of endlessly retransferring information from one outdated medium to another – such as video tape to CD – can be dispensed with. It costs virtually nothing to store and, unlike video tape which degrades rapidly with time, lasts thousands of years.’

People will start using DNA to store data within the next 50 years, Goldman believes, as the cost of DNA sequencing goes down.

‘Right now, I could see it as providing an excellent way of storing data that is now held on magnetic tapes – it’s not impossible to imagine that those vast dusty archives of tapes, whose corridors are currently patrolled by data retrieval robots, could be done away with once and for all with our method.’

More info