Big Data in Early America: Bibliometrics and The North American Imprints Program (NAIP)

In recent years and in a variety of different ways, librarians are considering how different methodologies brought to bear on historical inquiry might shift their practices. Recent examples include Meg Phillips’s post in which she asks whether distant reading practices should inform archival appraisal practices to support more distant reading. Doing so would mean that archivists would still appraise, but “at a different level of granularity.” Catalogers have also been asking themselves how the uses of online public access catalogs (OPACs) are changing. The work of digital humanists has also in part sparked such questions. The innovative work of Head of Collection Information Services at the Folger Library, Erin Blake, and Curator at the University of Pennsylvania’s Kislak Center, Mitch Fraas, model what we might do with machine-readable cataloging (MARC) records. Humanists’ ability and desire to work with large data sets means that the systems in which that data are generated are being considered anew.

For early American bibliometrics, The North American Imprints Program (NAIP), which is part of our General Catalog, is the place scholars will turn for distant reading and big data, as it contains records for United States imprints published from the beginning of American printing in 1639 through the centennial of American independence in 1876. For 35 years, this deep cataloging work has progressed in a series of phases, funded by generous support from the National Endowment for the Humanities (NEH). Most of NAIP is best understood as a union catalog because it does not only include records for imprints held at AAS, but prior to 1801, it seeks to be a comprehensive catalog for early American imprints by also including imprints held at other libraries. Our catalogers are currently at work on the 1801-1820 segment of the file, which likewise includes entries for titles held at AAS and at other libraries.

In other words, NAIP is the equivalent of the English Short Title Catalog (and AAS has contributed mightily to the ESTC since its inception) for North American imprints. Both of these composite catalogs have amazing potential to serve the work of digital humanities, as they can, to some extent, be understood as large-scale datasets related to both the people and the products of the book trade in the English speaking world up to 1800. Such work should not be undertaken without a healthy amount of skepticism and caution, however, because catalogs were not originally conceived for these purposes, but instead are now being used for such bibliometrics. Scholars such as Stephen Karian (see The Age of Johnson 21 (2011): 283-297) have pointed out the limitation of using the ESTC for such purposes, and NAIP too has been the subject of debates when used as a dataset rather than as a catalog.

hba_5volwhiteIn his “Note on Statistics” in the first volume of The History of the Book in America, editor Hugh Amory offers a cautionary note on extracting such conclusions from NAIP. The graphs in this appendix are generated from statistics pulled from NAIP, and Amory warns that because “NAIP was never intended to provide reliable and useful statistics of printing or publication…our statistics may be a better measure of modern American library economy, collection policies, and cataloguing practices than of books.” Amory cites a number of reasons why he ultimately agrees with Thomas Tanselle that “though the data of such union catalogs may give some ‘suggestive’ measures of relationships, their absolute value is of little worth.” Among these is the fact that a single book may have more than one record and one record might cover more than one book. In addition, “any consistent treatment of books and ephemera is impractical, given the haphazard formation of the library collections on which NAIP is based.” I read Amory’s comments less as critical of NAIP than cautious about using NAIP, or any catalog for that matter, in a way that was not what it was intended for. Such catalogs were constructed for the purpose of recording a library’s holdings, and when we use them to derive statistics, we will encounter inconsistencies that result not from the catalog’s failings, but from its construction with another aim in mind.

And yet, might statistical analysis derived from catalogs still be meaningful? Editor of volume two of the same series Robert Gross thinks so. He offers a different perspective on NAIP: “Thanks to a sophisticated classification of imprints that goes beyond the standard record…and identifies items by genre, series, illustrator, printer, bookseller, publisher, place, date, and language of publication, the [NAIP] catalog allows us to trace the volume and distribution of printed works over time and space and to gauge the relative importance of different types and their makers in the total output.” Gross admits that such tracing and gauging can be more accurate in certain decades than in others, but he still presents a much more optimistic view of the value of such statistical analysis than Amory, of using the catalog as a dataset from which bibliometrics can be extracted.

My purpose here is not to adjudicate, but to point out that the uses of catalogs are changing.  As humanists develop appetites for and abilities to process large data sets, we are putting new demands on catalogs and the records they contain. The catalog records holdings of material objects, but they also reflect the ways in which these objects exist within “library economy, collection policies, and cataloguing practices,” to return to Amory. Under the sagacity and innovation of Carl Stahmer, Benjamin Pauley, and others, the ESTC is leading the way in designing a union catalog for the twenty-first century, and we at AAS are watching their work closely.

Published by

Molly O'Hagan Hardy

Molly O’Hagan Hardy is AAS Director of Digital and Book History Initiatives. She shares news on digitization and cataloging efforts at AAS, coverage of digital humanities projects using AAS materials, and ideas for such projects. Stay current with all things DH at AAS by checking out the “Digital AAS” section of our website.

6 thoughts on “Big Data in Early America: Bibliometrics and The North American Imprints Program (NAIP)”

Leave a Reply

Your email address will not be published. Required fields are marked *