Molly O’Hagan Hardy is AAS digital humanities curator and an ACLS public fellow. Every month on Past is Present she will be sharing news on digitization efforts at AAS, coverage of digital humanities projects using AAS materials, and ideas for such projects. Stay current with all things DH at AAS by checking out the “Digital AAS” section of our website.
After attending some excellent panels at the recent Modern Language Association Convention on work being done in the digital humanities, I was struck by the number of projects in early modern and eighteenth-century British literary studies that rely on the Text Creation Partnership (TCP). Prominent eighteenth-century British literature scholar and digital humanist Ted Underwood describes Gale Cengage’s Eighteenth Century Collections Online (ECCO) partnership with TCP as “an ideal solution” to the problem of procuring clean, machine-readable texts. Anupam Basu relies heavily on the fruits of ProQuest’s Early English Books Online (EEBO) partnership with TCP for his innovative work on Shakespeare. And yet both at the MLA Convention and in some post-MLA snooping around, I could find no examples of early Americanists making use of Readex’s Evans Early American Imprints Series, 1639-1800 partnership with TCP. The most obvious reason for this difference is scale: while tens of thousands of titles have been included in EEBO-TCP, only 5,000 titles from Readex’s Evans have been included. This makes Evans-TCP less than ideal for large-scale text mining projects, but its corpus could still be used for single text or author data analysis, the building of digital scholarly editions (see how this access works in the section below), or for pedagogical purposes (many of the titles in Evans-TCP have not been republished in modern editions for the classroom). In what follows, I explain how one might conceive of a project using Evans-TCP by describing what it is and how it might be used for those at partnering institutions now and in the near future by researchers anywhere.
What is the Evans-TCP?
Evans-TCP is a partnership among the TCP, NewsBank/Readex Co., and the American Antiquarian Society that, between 2003 and 2009, created almost 5,000 accurately keyed and fully searchable SGML/XML text editions. In other words, actual people have typed every word of the selected texts, rendering a much higher degree of accuracy than the optical character recognition (OCR) software that Readex relies on to transform the scans of texts into words. Not only are searches of such texts more reliable, but through Evans-TCP a user can see the keyed-in text that she searches. When working in the Readex database, a user sees only the image of the original text that has gone through OCR software, but not the text that is being searched. Moreover, in Evans-TCP, a user can also access the XML file created by the TCP.
How can I find out which early American texts are included in the TCP?
In consultation with a number of scholars, AAS selected which titles within the date range of 1640 to 1800 would be chosen for Evans-TCP. The Evans-TCP page offers a number of search options to navigate through these titles: simple, Boolean, proximity, citation, and browsing. The searching is fairly intuitive, but if you need help, check out these instructional videos (though they were created for EEBO-TCP, the interface is pretty much the same).
The AAS General Catalog is another way to find texts in Evans-TCP. Note that on the upper right side of the screen in our catalog record (see the screenshot below), a user will see a link to any title included in Evans-TCP. As of now, the lock icon next to the link indicates that one must either be on the AAS campus or that of a partnering institution in order to access the Evans-TCP text, but read on to find out how that is changing SOON.
Who can access the TCP and how does access work?
This is the really great news: as of June 30, 2014, anyone anywhere can access the Evans-TCP texts. TCP welcomes requests for source files (now from users at partnering institutions, but soon for everyone) for individual texts, or the whole corpus of its titles. After the June release date, anyone will be free to access and make use of these raw files through an online directory where they can be downloaded. Rebecca Welzenbach, TCP outreach librarian at the University of Michigan, explains, “Our intention is to make them available in such a way that people can find and download them without having to come through us.” Welzenbach does offer one reminder: although TCP includes links to the Readex/Newsbank page images, these will be available only at subscribing institutions. The TCP makes XML encoded transcriptions, not the whole database, available. It is, however, these transcriptions upon which digital humanities work from the early modern period to the nineteenth century relies, and we at AAS would love to hear how early Americanists are making use of this incredible resource.