a centuries-old conception of the library as an enclosed instantiation of the universe's mighty sprawl.
But as with any new frontier, formidable challenges attend exciting possibilities—and nowhere has this been more apparent than in the efforts of the Digital Public Library of America, a coalition spearheading the largest effort yet to curate and make publicly available the "cultural and scientific heritage of humanity," with a focus on materials from the U.S., by harnessing the Internet's capabilities. The DPLA hopes to create a platform that will orchestrate millions of materials—books from public and university libraries, records from local historical societies, museums, and archives—into a single, user-friendly interface accessible to every American with Internet access.
While ambitious, the project was not unprecedented. The creation of a large-scale digital library catering to public access has been attempted for decades, by a cast of characters worth noting. Aside from Google, there's the Internet Archive, a non-profit digital library based in San Francisco that sees itself as a bulwark against a modern-day version of the loss of the Library of Alexandria. Brewster Kahle, who founded the Internet Archive in 1996 and is now on the DPLA steering committee, aims to supplement this digital reserve with a physical copy of every book in existence, collected and stored in a mammoth warehouse in California; he currently has about 500,000 volumes and hopes to reach 10 million one day. His efforts are complemented by the HathiTrust ("Hathi" is the Hindi word for "elephant," an animal that, as the saying goes, never forgets), a digital preservation repository founded in 2008 that has digitized over 10 million volumes contributed by participating research institutions and libraries. The 3 billion-plus pages amount to over 8,000 tons (but weigh close to nothing online, of course). Meanwhile, national institutions like the Library of Congress have been digitizing their in-house materials for years. The DPLA is not the first player to step onto the field.
Speaking in digital terms, the world produced more data in 2009 than in the entire history of mankind through 2008, according to the former chief scientist at Amazon.com. In one way, this explosion and the digital platforms that support it have been a boon for librarians and archivists, who specialize in collecting information and making it available to users. But in others, it has been a scourge, rendering the goal of staying abreast of the world's intellectual output (not to mention the hardware and software needed to store and display it), more quixotic than ever. Simply to reap the accessibility benefits that the Internet so tantalizingly affords, the centuries-worth of items currently extant only in cloth and paper need to be imaged into bits and bytes—a monumental, manpower-intensive, and prohibitively expensive task. And that is to say nothing of figuring how to cull and catalog the terabytes of information that have spent their whole life in digital format. All of which goes to show that the problem of networking the nation's "living heritage" online has barely begun to be addressed. The problem is one of time, money, and most of all, scale—massive scale.
With hundreds of librarians, technologists, and academics attending its meetings (and over a thousand people on its email listserv), the DPLA has performed the singular feat of convening into one room the best minds in digital and library sciences. It has endorsement: The Smithsonian Institution, National Archives, Library of Congress, and Council on Library and Information Resources are just some of the big names on board. It has funding: The Sloan Foundation put up hundreds of thousands of dollars in support. It has pedigree: The decorated historian Darnton has the pages of major publications at his disposal; Palfrey is widely known for his scholarship on intellectual property and the Internet; the staging of the first meeting on Harvard's hallowed campus is not insignificant. Ideally, the consolidation of resources—specialized expertise, raw manpower, institutional backing and funding—means that the DPLA can expand its clout within the community, attract better financial support, and direct large-scale digitization projects to move toward a national resource of unparalleled scope and functionality.
n the United States, the Library of Congress does not have the same mandate, and the lack of a center of gravity at the national level has therefore led to fragmented and disorienting results for a library community already known for fierce competition among its silos ("Cooperation is an unnatural act," remarks David Ferriero, the Archivist of the United States, recalling an adage from his tenure at other libraries). "If you think about where we are today in the digital library space, there are a whole lot of efforts that are not pulling in the same direction," says Palfrey, noting the obscurity of digitizing projects that have developed under discrete directives. "I defy you to find them. They tend to be in proprietary repositories that are very hard to find."
Haphazardly, likely with the help of a search engine, without much of a sense for what repositories have made a particular text or image available in the first place, or how best to find similar materials in the future. The DPLA—by taking up the mantle of "national library," of a command center for the country's published heritage—would put its users high above street level, offering easier and more systematic navigation. The DPLA does not plan to supplant or swallow up the institutions already contributing to the digital cause, Palfrey emphasizes. These groups have been building their own digital collections on a scale that befits their resources, leaving the DPLA to hone its responsibility into one of supporting, managing, and organizing—rather than of generating all the raw material. "We want to build infrastructure that will support public and academic libraries," Palfrey says. "We're not building the end-all, be-all digital library."
the question that drove the discussion was: "Could we do better than Google?" As Palfrey puts it: "The idea of having any single company control such an extraordinary public resource strikes me as a bad idea. Could we do better if we were to have a massive effort toward creating a digital public library that was not driven by a single corporation, but driven by a broad coalition of people?" Palfrey is careful to note that the DPLA is not a replacement for, or adversary to, Google Books (in fact, the DPLA hopes to draw upon Google's digital reserve in some sort of collaborative arrangement). But there's no denying the company's pervasive specter: If Google is a dubious father figure, the DPLA is a son attempting to get out from under the shadow of its forebear.
Some of the less savory elements of the Google Books Library Project passed quietly: Google employees were often negligent, filing Whitman's Leaves of Grass under "Gardening," for example, and what was supposed to have been a "free" program cost Harvard nearly $2 million (the university had to process 850,000 books to be digitized by Google). The more public bungle began in 2005, when the Authors Guild and the Association of American Publishers sued Google for violation of their copyrights. This marked a critical juncture: Google could have made a case for fair use to help expand the public's access to the literature.Instead, the company entered a period of intense, secret negotiations with the plaintiffs.
In October 2008, the groups emerged with a proposed settlement that effectively mutated Google's original vision of a digital archive into its ugly sister: a library and bookstore business. Under the terms of the settlement, users would be able to buy individual e-books, and libraries would be able to purchase a subscription for access to Google's entire catalog of books (books that these libraries, in some cases, had provided and processed themselves). The cryptic-sounding "Book Rights Registry"—a body composed of representatives from the Authors Guild, AAP, and Google—would determine the prices. Thirty-seven percent of the profits would go to Google, and the rest to the authors and publishers. Not surprisingly, the proposed settlement triggered widespread accusations of the commercialization and monopolization of knowledge, and it even prompted an investigation from the Department of Justice about the possible violation of the Sherman Antitrust Act. Harvard backed out of its partnership with Google (many of the other libraries continue to work with the company).
In November 2009, the three groups filed an amended settlement and awaited a decision, which would not come for another few years. Thus the first gathering in October 2010 of what would become the DPLA came at a time of taut energy. With the settlement on the table, along with the corresponding possibility that Google might be closing the door on public access and standing to profit from books that universities had made available, the matter seemed urgent. "I think the main point is that Google turned into a commercial digital library," Darnton says, "one without any constraints on its pricing policy." How had a private company come so close to controlling the fates of millions of books—and of possibly convincing users to agree to this arrangement? Who was defending the public interest?
For all their differences, Google and the DPLA do share a major hurdle: Copyright law, which prevents the digitization of orphan works, numbering around 5 million and constituting about 50 to 70 percent of books published after 1923. Orphans are works whose rights holders are not known; they may be dead or unaware of their entitlement. Google's settlement would have given the company license to appropriate orphan works for posterity—a move that would have opened up a trove of previously unavailable works, at the expense of granting Google unprecedented control through litigation. The DPLA faces a similar problem: As some members pointed out in a gathering last year, out-of-print and orphan works—content in the "yellow zone" of copyright—outnumber both public domain and in-copyright works, "making legal reforms necessary for the success of a DPLA,"
The DPLA will not violate copyright, and it will begin with a foundation of public-domain works. The organization is trying to figure out the best case for fair use of out-of-print or unpublished works to argue that public access to this literature benefits society and serves a "higher" purpose.
The DPLA, Darnton wrote, would "contain nearly everything available in the walled-in repositories of human culture"; the library would be "the greatest that ever existed."
A decade ago, libraries—particularly those at universities—were willing to accept the restrictions imposed by Google, so firm was their belief that they needed the company's help to go digital, according to Kahle. But he sees the progress of groups like the Internet Archive as proof that "we can actually do this ourselves." So what new purpose will the DPLA serve? There is a fine difference between supporting, rather than competing with, the digitization efforts of member institutions—including small public libraries, whose own collections are modest in comparison to what is being made available online.
Show all 16 highlights
Ferriero, the Archivist of the United States and active member of the DPLA team, wants "every stinking thing" in the National Archives digitized, but the agency has "just a toe in the water" (over 74 million pieces of paper, or less than 1 percent of total holdings, have been digitized). "Everything we do is piece by piece by piece, page by page by page, image by image by image—and that is a huge task," says Brenda Kepley, chief of the processing section at the Archives. To make substantial progress, the Archives has had to forge digitizing partnerships with universities and commercial companies—but the concern that the agency is years behind persists ("You mean, it's not all online?" people asked Archives staff in the mid-1990s, when the Internet was still in its infancy). Ferriero hopes that the DPLA will expedite digitizing at the Archives and draw greater attention to its untapped resources.