DNA Data Storage

Our precious selfies, snaps of our felines, and shots of our cappuccinos—to say nothing of our texts, emails, and songs—have created a massive data set. However efficient our current memory technology, that data takes up real physical space. The 44 trillion gigabytes we’re likely to create by 2020 would fill six stacks of tablets that each would reach the moon.

The problem is not limited to the trivial. Hospitals and intelligence agencies have huge and ever-growing quantities of data that need a safe, durable, and smaller home.

Now, researchers at the University of Washington in collaboration with Microsoft have proven it can all be made into DNA. Rewritten as base pairs, a warehouse of today’s data would fit into a thimble.

Already they have managed to encode 200 megabytes (which happened to include a video by the band OK Go, the Universal Declaration of Human Rights, and a hundred books from Project Gutenberg) into the molecule of life. “We’ve done the round of encoding the data into DNA, then reading it back, and then validating it to make sure there are no errors,” says Luis Ceze, a professor of computer science and engineering at the university, and the lead researcher for the project. “We’ve done this multiple times; encoded a bunch of data and recovered it perfectly, bit by bit.”

That perfection of the decoding is the feat that makes the seemingly futuristic technique practical. “Despite being incredibly reliable—it being the basis of supporting complex systems like living organisms—the process of writing and reading DNA is relatively noisy,” says Ceze. To overcome that noisiness, the team used “sophisticated error correcting algorithms and schemes.”

Data coded into adenine, cytosine, guanine, andthymine is likely to be as well preserved as it would be on a hard drive. “DNA can be very, very durable,” says Ceze. “If you dehydrate it, and keep it away from water, away from light, and away from heat, which is what you do with electronics in general, it lasts for centuries if not for millennia.”

The equipment with which Ceze and his team decoded their data is the same used in genomics and medical diagnostics. And it’s thanks to the incredible advances in DNA sequencing over the past decade—for those purposes—that Ceze was able to read his double helical storage. Cost has plummeted and speed has increased at a rate faster than Moore’s law. “The cost of DNA sequencing has dropped 100 million fold in the last nine years,” says Ceze.

But in terms of storage, it’s still on the costly side; about a thousand dollars a megabyte. “That’s, you know, millions of times more expensive than computer memory,” says Ceze. But for a proof of concept, it’s not too bad. And, with that proof now in the bag, the team will begin to focus on automation and bringing the cost way down. “There might be a market for a very expensive storage as long as it has the right properties,” says Ceze. “But our focus, to be honest, is to push the price down to levels that are comparable to archival technologies today. We’re just scratching the surface. There’s a lot to do.”

Michael Abrams is an independent writer.

But our focus, to be honest, is to push the price down to levels that are comparable to archival technologies today.Prof. Luis Ceze, University of Washington

Date Published:

Aug 25, 2016