In the attic of my grandmother's house there is a box with my old personal stuff. Inside there is a smaller box with a bunch of 3.5 floppy disks. One of them has a fading handwritten label "Bachelor thesis - data".
To get to the data I could use the 3.5 drive in a 90's minitower resting in my wife's parents' cellar. The only problem is that it won't boot. Even if I fix that, will the 3.5 disk still be readable? Further, if my children, 20 years into the future, wanted to have a look at the data, will they even know what a minitower is? For them the 3.5 floppy disk will perhaps be as impenetrable as the Egyptian hieroglyphs were before the discovery of the Rosetta Stone.
I guess that almost all science that has ever been stored on floppy disks, magnetic tapes, or punched cards, is now practically inaccessible.
In this context, Jeremy Fox asks:
… in what concrete ways do you feel worse off because you can’t access the data that past ecologists stored in a now-unreadable format on 5.25″ inch floppies? Do you ever have occasion to curse their lack of foresight?
Well, I don’t feel worse off, and I don’t curse past ecologists for using floppies. I am ok today because most of the 20th century’s science was actually printed, and hard copies have one advantage over binary data: they are readable without any special device – humans can access them directly. I can still get Taylor & Woiwod’s 1982 estimates of mean-variance scaling of insect populations from their grand table printed over several pages of Journal of Animal Ecology.
Today we consider electronic data to be somehow more secure and accessible than in the floppy nineties. We are so confident about the solidity of the cloud that we have been moving the entire scientific infrastructure to it. Countless papers are published, heaps of data are deposited online, new journals emerge, and it all exists solely as electromagnetic bits that degrade over time, or on flash drives that degrade over writing cycles. And in order to read the stuff, we always need a rather sophisticated device: a personal computer with up-to-date software.
But keeping software up-to-date can be a problem when hardware ages. Operating systems change, and there is no guarantee that our current hardware will be operable in 50 years. Example: Proprietary operating systems (such as MS Windows) tend to give up on old hardware support and do not enable downgrading to older versions. Further, file formats evolve. There was no .xlsx when I was a kid, now everybody uses it and nobody cares about .dbf. Although this does not seem as an issue now (we can still open .dbf with Excel and other programs), I can imagine that Excel2050 will drop the .dbf support because it will be of little commercial value, or just to make things simpler.
In the past, libraries received hard copies of the journals that they subscribed to, creating a globally distributed hard-copy backup of scientific knowledge, with no restriction on future use. The stuff was accessible even when the subscription ended. In contrast, the current trend is towards online access only; libraries pay for temporary access to remote repositories on publisher’s servers. When an institution can no longer afford access to Elsevier’s journals, everything is all of a sudden inaccessible.
It bothers me that nobody seems to care about what happens, in the long run, to all the electronic science produced today. Will my grandchildren be able to open the data that have been uploaded to the clouds? Will the clouds still be there? Will people have access to Wiley's online library or to JSTOR? Will these companies still exist? Will future generations be able to read .pdf or .xlsx formats? Will they, in a hypothetical post-nuclear (post-global warming, post-biodiversity loss) future have computers at all? And how about grandchildren of our grandchildren?
Empires fall, climate changes, catastrophes happen. That is the historical experience. Americans are just about to vote if they give access to their nuclear arsenal to a maniac. And the saying goes: Just because you’re paranoid doesn’t mean they aren’t after you. Sometimes I even imagine a planet-of-the-apes future where a laptop is found, covered with moss, mysterious, with whole libraries stored (but probably corrupted and fragmented) on a tiny little object inside, invisible and inaccessible.
Maybe I am exaggerating this. Jeremy Fox writes:
The “needs” and “wants” of long-distant future people aren’t well-defined today. What they “need”, and “want”, and what they do to meet those needs and wants, will depend on what we do today – but in complex and unpredictable ways. At the timescales you’re talking about, there’s nothing but Rosetta Stones. Nothing but sheer luck.
Perhaps. We can’t foresee the distant future. But I still believe that we can make some relatively small and easy precautionary steps, which can (with a bit of luck) help our grandchildren to access our science.
- To scientists: Use non-proprietary and simple text-based data formats (e.g. .csv) to deposit data. Use simple text-based file formats for writing (e.g. Markdown, LaTeX, html). Use free and open-source operating systems (e.g. GNU/Linux) and software (e.g. R).
- To journals, publishers, editors: Think twice before going 100% online. If that can’t be prevented, maybe print everything once a year annually in small numbers of prints, and send that as hard copies to certain libraries anyway.
- To funding agencies and governments: You often require that research funded with your money is open-access. How about adding another requirement: The research results should also be properly archived in some durable form. Somewhere.
- To libraries: I see two roles of a library – it enables access to literature, and it archives. The latter role can be fostered, and libraries can play an active role in archiving electronic literature that they subscribe to.
- To engineers: Please invent a cheap technology for storage of large amounts of data that does not degrade over time.
- To historians (and to Peter Turchin): Some ancient literature has still made it to the present. Please investigate why some of it survived, and why some of it hasn’t. Was it nothing but sheer luck? There can be some fundamental patterns and lessons back there.
- To artists: Please carve the Unified Neutral Theory of Biodiversity, Metabolic Theory of Ecology and Maximum Entropy Theory of Ecology to granite. Mount Rushmore could be a good location.
- To museums: Collect present-day computers.
- How about this, but for scientific data and literature?
Motivation for this post came from an exchange under a post by Brian McGill on Dynamic Ecology.