The University of Michigan Library, the University of Oxford’s Bodleian Libraries and ProQuest have made public more than 25,000 manually transcribed texts from the first 200 years of the printed book (1473–1700).
by Sydney Hawkins
THE TEXTS OF the first printed editions of Shakespeare, Chaucer and Milton as well as lesser-known titles from the early modern era can now be freely read by anyone with an Internet connection.
The University of Michigan Library, the University of Oxford’s Bodleian Libraries and ProQuest have made public more than 25,000 manually transcribed texts from the first 200 years of the printed book (1473–1700).
These texts, including more than 5,600 from U-M, represent a significant portion of the estimated total output of English-language work published during the first two centuries of printing in England.
The release (via Creative Commons Public Domain Dedication) marks the completion of the first phase in the Early English Books Online-Text Creation Partnership (EEBO-TCP). An anticipated 40,000 additional texts are planned for release into the public domain by the end of the decade.
The EEBO-TCP texts were transcribed from ProQuest’s Early English Books Online (EEBO), a subscriber database of facsimile images obtained from books in libraries all over the world, including the British Library, the Folger Shakespeare Library and the Bodleian Library at Oxford.
Among them are some of the first books printed in English, a body of work that includes early English literature as well as works of history, philosophy, politics, religion, music, mathematics and science.
Highlights include several of William Caxton’s editions of the works of Chaucer, the first translations of Homer by the Elizabethan dramatist and classical scholar George Chapman, and Sir Isaac Newton’s Philosophiae naturalis principia mathematica.
Possibly of even greater value are the thousands of less famous texts that offer unexplored avenues for discovery: gardening manuals, cookery books, ballads, auction catalogues, dance instructions and religious tracts detail the commonplace of the early modern period; books about witchcraft and sword fighting document its more exotic facets.
Many of these works have never before been available to the public online, and physical copies are rare and require special handling.
The transcribed texts, as open data, are freely available for anyone to read, reuse, reproduce, repurpose and distribute (ProQuest’s EEBO image database remains available only to subscribers).
The Partnership That Made It Possible
At its inception in 1999, the aim of EEBO-TCP was to convert the extraordinary corpus EEBO represents into fully searchable digital texts.
For modern printed works, such conversions rely upon optical character recognition (OCR), which can automatically produce searchable text from scanned images. But these first printed works use character sets and spelling that aren’t OCR-friendly. Age and print quality present additional hurdles to machine readability.
The conversion of EEBO texts requires painstaking manual labor keyboarding the texts, including Extensible Markup Language to encode the structure of the text (chapter divisions, tables, lists, etc.), and a thorough editorial process to ensure accuracy.
To get it done required a transnational collaborative enterprise driven by the U-M Library; the Bodleian Digital Library at Oxford; ProQuest; the Council on Library and Information Resources; Jisc, a charity that provides digital solutions to U.K. education and research; and the support of more than 160 partner libraries.
“The open access release of the first group of EEBO-TCP texts marks an important milestone in an extraordinary international partnership between public and private entities,” said Charles Watkinson, U-M associate university librarian for publishing. “The opportunity now exists for scholars both within and outside the academy to apply powerful digital scholarship tools to a huge body of material that is of central importance to world culture. The University of Michigan Library is proud to continue to support this landmark project.”
EEBO-TCP has already provided key source material for scholars with institutional access, and has contributed to monographs, articles, essay collections and scholarly editions as well as computer-aided linguistics.
The release into the public domain creates new opportunities for research around the globe, and for corpus-based textual analysis (the entire body of work can be downloaded by anyone via box.com).
Full-text public access to the transcribed EEBO-TCP texts is hosted by the U-M Library. The Bodleian offers individual text downloads in several formats, including ePUB files.