In the late 1990s, geneticists began studying extinct species’ DNA, analyzing hair and bone preserved in frozen tundra. At that time, most computers stored data on floppy disks that held just 1.44 megabytes of memory — smaller than the average selfie. Today, those disks might as well be Ice Age artifacts, too. Not only is their storage capacity miniscule by today’s standards, but recovering their data is practically impossible, due to the degradation of their materials and the special equipment required to read them.
The floppy disk encapsulates some of the greatest long-term challenges to computer science. According to Microsoft principal researcher Karin Strauss, future storage will need exponentially greater density to hold the data we produce as electronic devices become a greater part of our lives. Plus, long-term archiving will depend on preserving data in a format that will remain readable, on materials that won’t degrade.
The answer to those challenges may lie in you, me and those same prehistoric beasts geneticists studied years ago. “DNA can last for a long time,” says Strauss, who is also a professor at the University of Washington. Plus, it can also store lots of information in very little space: All the genetic instructions for a mammoth lie in a single molecule. By Strauss’ calculation, a whole data center would be no larger than a couple cubes of sugar. And since it’s the code used by all life on Earth, “we’ll always be able to read it,” she says.
The idea of storing data in DNA predates Microsoft and floppy disks, if not quite the woolly mammoth. DNA is a twisted ladder with rungs made of four different substrates that connect in pairs to hold the ladder together. The order of these substrates, known as bases, provides assembly instructions for the organism. In the late 1960s, scientists realized that DNA could carry other information if researchers could dictate the bases’ order and machines could read that order. Thanks to advances in genome sequencing and genetic engineering, these processes have finally become efficient in the past couple of decades.
Computers have also evolved to become more powerful. Still, nobody knew how to efficiently retrieve precise bits of information from DNA. That task is “not trivial,” says UW computer scientist Luis Ceze, who directs Microsoft’s research initiative with Strauss.
This year, in a joint effort by Microsoft and UW, Strauss, Ceze and their colleagues demonstrated how DNA could support future data centers. The team combined software that encodes and decodes data into DNA with machines that produce genetic material and prepare it to be read by the software. With that system, they managed to store and retrieve the word hello. The whole process took 21 hours, but, critically, it was totally autonomous. “For DNA storage to be practical, we need to remove the human from the loop,” says Strauss. Her robot is the first proof-of-concept for a whole new species of computing.
Still, some scientists question whether DNA is the best molecule for the job. “The structure of natural DNA came from … four billion years of Darwinian evolution,” observes Steven Benner, a distinguished fellow at the Foundation for Applied Molecular Evolution. In that time, DNA has developed a lot of evolutionary baggage that can get in the way of smooth operation in computers, like physical differences in how base pairs behave. To address this, Benner has recently developed four artificial bases that work similarly to DNA’s bases, but don’t have those inherited differences.
Strauss readily acknowledges the baggage, and the long-term potential of Benner’s bases. But she points out that those billions of years of evolution have provided a good starting point. Equally important, she notes, there’s a vast biotech industry developing the machinery that can help bring DNA storage from the lab to the data center. “I think DNA is the best first molecule for molecular information technology,” she says.