Hindi user guide, film with barcode: Here’s how Github is archiving code for a thousand years.
New Delhi:-The Github Archive Program at the Artic Vault in Norway has just been loaded with a backup of the open source software repository to last at least 1,000 years.
A few centuries from now, someone could dig up a silver halide film plate in an ancient coal mine in the arctic circle and stare at the code of Microsoft MS-DOS with the same curiosity millennials today have for Egyptian hieroglyphics. The Github Archive Program at the Artic Vault in Norway has just been loaded with a backup of the open source software repository to last at least 1,000 years.
“We believe it is worthwhile to preserve all the open source software because so much of our life today depends on open source software, whether it is my cell phone that I use to order groceries or watch a movie or communicate with my family and my friends all around the world,” reasoned Thomas Dohmke, Vice President for Special Projects at GitHub.
Dohmke said the vault in the archipelago of Svalbard, next to the Global Seed Vault, was part of Github layered approach to archiving all open source software. The vault deep in the permafrost forms the cold layer, where a snapshot taken on February 2, 2020 (2/2/2020) has been stored. Dohmke said the backup taken on the day was saved on a couple of hard drives and shipped to Github Norwegian partner Piql for printing on the film reels. “Two weeks back, they shipped the files to the vault,” he said, expressing a bit of disappointment at not being able to be there for the occasion because of the Covid travel restrictions. There is also a hot layer which is a live streaming backup. In the warm layer, backups are saved monthly and quarterly.
But planning an archive to last a millennia is much more than a tech challenge. While you need to keep in mind that the tech of today might make no sense to future generations, there is even more basic stuff to consider like the language to use future proof the concept. “At GitHub we are all software developers and not archiving experts. So we looked for a panel of advisors and partners who have been advising us for the last year of what the right approaches to archive data would be,” explained Dohmke.
So Github is now working with archaeologists, archivists, linguists and scientists to figure out what the best way to approach the problem at hand. It also gets help from the Long Now Foundation, the Internet Archive, the Software Heritage Foundation, Arctic World Archive, Microsoft Research, the Bodleian Library, and Stanford Libraries, he added.
“We looked at who else was doing something similar and found this company in Norway already offering archiving solutions. So we didn’t invent the archive, we found someone already doing it.” The old coal mine where the vault is situated already has archives from Unicef and the Vatican Library. It is also on a hill preventing any eventuality of flooding with say rising water levels or melting arctic ice. The film reels used in the Github archiving project were stress tested to see if they could survive a thousand years.