Your local natural history museum probably displays its most impressive fossils front and center, ready to awe visitors. But some museums also house troves of other remains, filed away in drawers, that could reveal helpful details about their eras of origin. Usually, there’s no scientific literature describing these fossils, and only museum workers have access to them. Researchers call these unstudied items “dark data.”
In an effort funded by the National Science Foundation, paleontologist Peter Roopnarine, a curator of invertebrate zoology and geology at the California Academy of Sciences, and his colleagues across the country are trying to bring that dark data into the light. For nearly the past decade, they have worked to catalog those forgotten fossils with full digital records.
Highlighting the need for such a project, in a recent Biology Letters paper, Roopnarine and his team found that in some California collections, dark fossils yielded 23 times as much data as existing archives.
Discover: First, can you tell me about the shortfalls of the current system of documenting fossils?
Roopnarine: We’ve been publishing on fossil occurrences, descriptions and discovery for a couple of hundred years. That information is in the literature, but compiling it’s very difficult, because it’s spread out across many journals in many different languages. It represents, really, only a fraction of all the material that’s been collected, and a lot of this material is sitting in museums around the world.
So why has this effort to digitize the records only happened in the last several years?
There have been museums that, for the past 20 years, have been putting their catalogs online. However, we’ve lacked the technology to bring all of them together. Everyone uses a different system that they’ve developed, but we need a common language to describe materials in museums. Once we have a codified language, we can put our data into that common language so we can all understand our various catalogs and collections. These are very difficult problems, and once you overcome them, there’s just no way to get around the fact that you need warm bodies to sit there and make all of these data available in an electronic form.
What kind of measurable progress have researchers made on digitizing dark data so far?
We looked at the [NSF-funded] Paleobiology Database, which has compiled all of the fossil localities in California that have had material published in the scientific literature. Then, we compared that to the total number of localities that we have documented from our museum collections, many of which we know have never been published on. We have 23 times the number of localities and data points in our recent compilation than has ever been accessible to the scientific community.