By Michael W. Harris
There is little doubt that our lives are fully enmeshed with our digital technology now. From digital assistants like Alexa and Siri, to wearable technologies that track our health and steps, to the smart house that functions more and more like the computer on the Enterprise with every passing update, all of us put our trust in these technologies and the “cloud” with very little thought. And at the risk of being branded a Luddite, there is reason for concern about all these developments, but these concerns must always be balanced by the benefits that they can bring to society.
However, amongst all the potential pitfalls and causes for concern that one can have about the phone in your pocket, the tweet that you write, or the obscenely long terms of service that you never read and click accept on, the most disturbing may be what will happen to the information and content we are creating at an astonishing rate yet do very little to protect and preserve. Just as the twentieth century saw an explosion in the creation rate of paper records that caused a crisis of archival practices that we are still reckoning with, the twenty-first century has given us the impossible task of how to deal with the countless petabytes (zettabytes, even) of born digital data that are sitting on media ranging from floppy discs to server farms.
What do we do with this information? How can we preserve it? Should we even try to preserve all of it? How or who will decide what is saved? Who even owns it? How can we reckon with an entire age that has created more data in a few years than the whole of human history that come before it? While we cannot save everything, we are trying. But there are questions about how, who, and if we should, that make the race to preserve our current information age more difficult than any preservation project previously attempted in human history.
Black Holes and Preservation
In February of 2015, Google Vice President and “Chief Internet Evangelist” Vint Cerf made a dire prediction about the ability of future generations to remember our particular moment in history: the century of transition from the analog to digital world. In remarks made before the American Association for the Advancement of Science, Cerf warned that we are in danger of a “forgotten generation, or even a forgotten century” because of the dual threats of the obsolescence of technology and software combined with the ticking clock of “bit rot.” (N.B. – Bit rot is where digital information will eventually be lost because the bits that encode the files degrade.)
Cerf expanded upon his remarks in an interview with The Guardian, stating that, “We are nonchalantly throwing all of our data into what could become an information black hole without realising it. We digitise things because we think we will preserve them, but what we don’t understand is that unless we take other steps, those digital versions may not be any better, and may even be worse, than the artefacts that we digitized.” Cerf’s description of an “information black hole” captured the minds of the public that month and the term has remained lodged within the media, more widely called the “digital black hole:” the void to which our digital lives will be consigned and whose eventual loss due to bit rot and obsolete software dooms us to oblivion, our lives and memories lost to the ether.
Cerf’s proclamation led to widespread coverage in the media, and many outlets picked up the story, including The Independent, The Financial Times, The Atlantic, and the BBC. The Guardian followed up their initial story three days later with an in-depth exploration which used the phrase “digital black hole” in the title. The Atlantic also had a deep dive into the concept with a long form piece in October 2015 which explored what could be at stake without proper archiving of the internet. The Atlantic piece included interviews with Internet Archive chief Brewster Kahle along with an illustrative example of a piece of award winning journalism that was almost lost when a newspaper folded. A large component of the reporting was only posted in the on-line version of the story, which was left homeless once the paper’s site was shut down. But for all of the handwringing done by the press, this was not the first time that these ideas had been expressed, either by Cerf or other technologists and journalists.
In a piece from the New Yorker on January 26, 2015, Cerf is quoted in an email using the exact same phrase of “informational black hole.” The New Yorker article is an exploration of the work of the Internet Archive and Kahle, but it also asks the larger question of if we can actually preserve all of the internet for future generations of scholars, researchers, historians, and our own cultural posterity. The article goes to great depths to detail the tremendous forces that make such an effort exceedingly difficult.
Cerf’s warnings of the “informational black hole” are tied to his calls to develop a “digital vellum,” a preservation format that will always be able to be accessed and also preserved for centuries like the vellum paper of old. As Cerf explains, the problem is not just bit rot or link rot or content drift or even people deleting Twitter posts. No, the problem is much bigger. It is how older file formats become unusable when programs become obsolete and replaced and their files are no longer able to be opened by the programs that replace them. For Cerf, his digital vellum would solve the problem because it would be not only long-term storage but also ensure the interoperability of the files with newer programs or the ability to recreate the program and system they were originally created on.
The existential crisis of the digital black hole did not start with Cerf in 2015, though. Indeed, it seems to have begun almost as soon as the World Wide Web itself was created. Brewster Kahle articulated that such a fear existed when he first formulated the Internet Archive, which began collecting old versions of web pages almost as soon as the commercial web began in 1994. In an article for the New York Times in 1998, Ashley Dunn wrote, “There is an overwhelming sense of the temporary on the Internet. Everything seems to exist there for a short period of time, eventually to be replaced by endless updates, revisions and remakes that wash over the present, leaving a trail of dead links in its wake. The virtual world is so easily manipulated that we create and destroy it with abandon. We scatter pieces of our lives like dust, rearranging, manipulating, deleting and copying material as if it were just, well, cheap bits that have little connection with the preciousness of their real counterparts.”
In her piece, Dunn writes about personal websites, the personal stories of the early internet adopters, being lost once their creators die and the sites stop being supported. She mentions Kahle and the Internet Archive, but worries that the smaller sites will be lost, the stories of the individual whose sites are not important enough to be archived by Kahle’s project. She discusses a project called Afterlife that would be like the Internet Archive but for these smaller sites. Ironically, afterlife.org seems to have also fallen by the wayside and the site itself looks to have not been updated in a decade or more. In thinking about the preservation of these stories, Dunn draws upon a powerful trend in archives to seek out and document the lives of those who have traditionally been left out of the histories in lieu of the rich and/or powerful. And she closes her piece with the death monologue from the end of Blade Runner which articulates our fears of being forgotten in language few of us can ever match: “All those moments will be lost in time, like tears in rain.”
Ius oblivione delebitur
In ancient Rome, one of the most severe punishments that could be handed out was called damnatio memoriae, or condemnation of memory. Essentially, it was an order that all records of the existence of a person will be removed. This punishment was reserved for those considered to have committed treason against Rome or were otherwise enemies of the empire. Upon the death of those sentenced to damnatio memoriae, their name would be removed from written records, including those engraved in stone, their likenesses would be removed from paintings and carvings, and statues of them would be torn down. All evidence of a person’s existence and achievements would be wiped from the annals of history. And while such punishment was usually reserved from the elites of Rome—emperors, senators, and so on—today we are seeing its possible application on a wider scale on the internet, albeit with a twist: it is self-imposed.
This “right to be forgotten” or more specifically “forgotten on-line” is not a new concept, but in its application to our digital lives, it was first proposed by the European Commission in 2012. As explained by historian Antoon De Baets, this internet right “stipulated that natural persons would obtain the right to have publicly available personal data erased and not further disseminated when they were no longer necessary in relation to the purposes for which they were collected.” In layman terms, if the information was outdated or no longer relevant, then it should go away and not follow you around. So, if you had an ill-advised escapade as a twenty-something that put your mugshot on-line, it should not continue to be publicly available after the charges were dropped and you are well into your thirties.
This privacy debate is decidedly more recent and only found its way into popular discussions around a year before Cerf’s comments on the informational black hole. It captured international headlines when a ruling in the European Union Court of Justice forced the issue from Europe to the United States. In short, the ruling required Google, and ostensibly all search engine sites, to delist links to certain webpages, be they news stories about past indiscretions or otherwise reveal personal and private information. The actual content is not taken down, but a person can no longer find it linked via a search in Google or Bing.
What is striking is the language used in many of the articles discussing this issue. As opposed to the sheer panic of lost information one reads in the articles spinning out from Cerf’s comments in 2015 and after, there is a certainty that the internet never forgets when discussing a person’s right to be forgotten. Suzanne Moore, writing for The Guardian on August 7, 2017, said that people are, “learning the hard way that once something is online, it never really goes away.” And in 2015, Farhad Manjoo wrote in the New York Times that, “the Internet never forgets, and, in its robotic zeal to collect and organize every scrap of data about everyone, it [has wreaked] havoc on personal privacy.”
However, balancing a person’s right to privacy against the public’s right to know has long been a hard line to walk. De Baets says that in Europe, “legal decisions used to balance the right to privacy against the right to be informed, itself a part of the right to free expression…[but the 2012] draft [outlining the right to be forgotten] tends to replace this balance by granting a central place to privacy.” What, however, is lost in this reweighing of the scales? And how easy would it actually be to enforce such regulations? As recent statements by public figures have shown, proving that an event happened or did not in the past can be a tricky enough thing, but it is greatly compounded by the internet’s ability to be deleted and overwritten. Brewster Kahle’s Internet Archive preserves some of the internet for posterity via its Wayback Machine, and the intrepid have learned to screenshot tweets that might later be deleted by the regretful, but much is lost in the interim and much more could be at stake should a widespread right to be forgotten on-line take hold that overrides the public’s right to information. For academics, writers, reporters, and many others who conduct research based activities, this could devastate their ability to carry out their work. To quote from De Baets again, he summarizes the swinging of the pendulum towards privacy and away from access to information this way:
Oddly enough, some seem to think that when persons are able to invoke a right to be forgotten, they will also be encouraged to freely express themselves because their opinions are then reversible. In contrast, I think that a generic chilling effect is more likely. The protection of A’s privacy bolsters A’s free expression, but A’s right to be forgotten, as a radical offshoot of A’s privacy and regulator of sources about A, chills B’s rights to information and expression. A right to be forgotten disproportionally distorts the balance between free expression and privacy in favour of privacy in the already privacy-favourable European context. It will encourage data controllers to err on the safe side.
In this way, what was once the greatest punishment that the Romans could hand out on those it deemed enemies of the empire (or republic), is now partially enshrined in EU law, and one that a person chooses to impose upon themselves. Where our society goes from here has yet to be decided, however the next question that could come into play regarding the control and preservation of information about ourselves on-line could very well be that of who actually owns the content we create.
Your Posts Are Not Your Own
So, on the one hand we have the incredible ease of access to information and the looming specter of its loss. On the other hand we have the possibility of an increasing definition of privacy in the on-line world that could suppress or delete a lot of that information. And sitting in between these two extremes of preservation and privacy is the ever-looming specter of who owns your information on-line. While I may be able to delete an ill-advised Twitter or Facebook post, do I actually own or control them in the long run? Without diving into endless user agreements that we never read when we sign up for services, we must wrestle with the question of what we are giving up to have access to these free services that post our personal lives for all to see. They are ostensibly free because we are trading our information—both the boxes we fill out and the data that can be mined by our likes, shares, posts, and sites visited—to be sold things and for our information to be sold to said advertisers. This is the agreement. The commodification of our lives, which can only happen because we are handing over the rights to the information in some form to Facebook, Twitter, or whatever comes next.
John Lanchester wrote in an essay for the London Review of Books, aptly titled “You Are the Product,” that: “When the time came for the IPO, Facebook needed to turn from a company with amazing growth to one that was making amazing money…The solution was to take the huge amount of information Facebook has about its ‘community’ and use it to let advertisers target ads with a specificity never known before…That was the first part of the monetisation process for Facebook, when it turned its gigantic scale into a machine for making money. The company offered advertisers an unprecedentedly precise tool for targeting their ads at particular consumers.” Stephen M. Feldman succulently put it as, “users gain [access to platforms] only because they simultaneously relinquish data about their personalities and habits—data that corporations can turn into profits.”
On the one hand, by making our information valuable, giving it monetary worth, it has ensured that, in some way, it might actually be preserved. By creating an information economy we are ensuring that some information is retained. This is nothing new, though. What records we do have of centuries past are around because they had some worth. Books printed in high numbers because they were popular stood a better chance of surviving than limited run books. Manuscripts that became valuable as artistic objects to collectors generations later survived while others were recycled as material for binding. And the double edge sword of copyright length looms over the creative works of the twentieth century, ensuring that a lot of information retains its commercial value while also making it difficult to give some works a second life in the public domain. The Internet Archive has been leading the charge of preserving much of our legacy, both digital and analog, in terms of books, music, webpages, and even video games. And recently it has helped dislodge many works previously thought under copyright by leveraging a little-known provision within US Copyright law that allows items in their last twenty years of copyright to enter the public domain if they are not currently in print or otherwise monetized.
But should the commodification of our culture, our memories on Facebook and Twitter, our lives, be a metric for the preservation of who we are and our history? That is, if we even want it to be preserved? Almost every morning when I log into Facebook I am greeted by a story in my feed informing me that Facebook, “[cares] about you and the memories you share here. We thought you’d like to look back on this post from [x years] ago.” But why does Facebook care? Cynically, it is because they want me to continue to share my memories in order to feed them a steady stream of data to sell to advertisers.
What many people tend to forget when they post to Facebook, share a photo on Instagram, or send out a tweet, is that the preservation of that data is controlled solely by that corporation, the Library of Congress’ long-stalled, and now ended, Twitter archive project notwithstanding. Having such information at the whim of large corporations, whose own archival policy can sometimes run afoul of their desire to protect themselves, leaves a large swath of our cultural heritage at risk, and truly we have to think of our posts and tweets and photos as exactly that, a cultural heritage. It is our shared history, a collection of our values—both good and bad—and a record of who we are that we will pass down to following generations. But just like the books, scrolls, and records carved into stone or written on vellum, it is subject to loss. But whereas the loss of those previous records were subject to natural decay and the occasional invoking of damnatio memoriae, these digital records could be deleted by a keystroke as a company goes out of business and its servers are wiped and resold at auction. A corporation will end up doing what is in the best interest of the company and its shareholders, not the public or our posterity.
In discussing the regulation of speech on-line, Feldman makes an observation about the multinational corporations (MNCs) that control our information and access to platforms that can equally apply to the preservation of our digital heritage:
The massive intermediary-MNCs therefore control and readily suppress online expression for their own purposes—profit. They have no principled concern for the First Amendment. If it is to their benefit (profit) to invoke the First Amendment, they will do so. If it is to their benefit (profit) to suppress expression, then they will do so. MNCs manipulate the First Amendment and channel individual freedoms for business purposes only.
One can easily imagine the substitution of “preservation” for “expression,” and “cultural heritage” or “information” for “First Amendment.” Because, as Feldman summarizes: “corporations generally have one goal: to maximize profit.”
This is not to fault corporations for doing what they are supposed to do. Businesses exist to create value for owners and employees. That is what they do. Rather, the aim of shining a light on the issues surrounding the information we share and the dangers of a lost age because of the supposed stewards of said information, is to force our society to face the hard questions of history, heritage, memory, and all those moments that could be lost in time.
* * *
Ashley Dunn’s piece for the New York Times is titled “Web’s Evanescence Creates Challenge for Archivists,” which rather evocatively gets to the issue at hand. On the one side lies the ever changing and mutating nature of the web, with its ability to be overwritten, deleted, and otherwise altered in a way that makes preservation difficult. To say nothing of the way that changing technology creates barriers around access and preserving the way the site would have been originally displayed. On the other hand, though, there is the immediate access to information on a scale that is truly staggering. We used to be able to count on the privacy of our lives and information because of either scarcity of copies or the difficulty of access. Now, though, we have the tools to sift through petabytes of data all before our first cup of coffee is finished brewing. Compounding this is that we are seemingly okay trading our personal information, our personal lives, and our personal memories for access to networks over which to share said data with friends. We are willingly giving access of some of our most intimate information to companies and allowing them to commodify it in order for them to provide a service to us.
And what role will librarians play in this brave new world? We have already had a crisis of profession with the “Google problem,” though we remain as busy as ever. So perhaps we need to ask ourselves what is our ethical obligation in regard to the information that our patrons create when they use our computers to access the internet? Just as we have taken stands as a profession on the issues of censorship and access, on the archiving of documents and holding the powerful accountable, maybe it is time for our profession to take a look at social media, social networks, and the use of our information in a critical light rather than how we can use these tools in our next promotional campaign. Just as we protect the information of our users from improper searches by law enforcement, perhaps we need to be better educated ourselves of how our information is being used so that we can thus educate our patrons. And with this education of how to protect their information, we can also teach them how to personally archive their own digital lives in a way that can actively preserve it. Teach them how to leave an interoperable legacy for generations to come. This is not so far-fetched an idea and the Library of Congress has an entire webpage devoted to it. Regardless, we must do something more to address these issues or this age will truly be “lost in time, like tears in rain.”
“I have seen things you people wouldn’t believe,” begins Roy Batty in his famous death monologue from Blade Runner. In many ways, in the almost twenty years since Dunn wrote her piece for the Times, our culture and society have changed in ways that a person in 1998 would scarcely believe. Our society has willingly given up our information in a bargain that many of us do not fully understand. We have traded our memories for a service whose benefit to society is still in question. And overarching it all is the race to either preserve those memories or wipe them from existence.