ICHORA 2020, DAY 3: Bethany Anderson, “Machine Learning and Archival Practice”

Note: This presentation was part of a three-person panel entitled “Histories of Data Archiving: From Fisheries Research to Cybernetics.”

Dr. Bethany Anderson’s (University of Illinois, Champaign-Urbana) fascinating presentation “Machine Learning and Archival Practice: A Cybernetics Case Study on Computational Approaches to Digital Materials” reports on the results of a two-year long NEH-funded project at the University of Illinois to explore new ways to “create access to [a] computational archival project and the data it generated that is in concert with current and emerging needs.” Anderson, who serves as the Natural and Applied Sciences Archivist and Assistant Professor within the University of Illinois Archives, arrived in Champaign-Urbana in October 2018, halfway through the completion of The Cybernetics Thought Collective: A History of Science and Technology Portal Project, which was co-directed with Christopher J. Prom (University of Illinois Archives). [1]

At its heart, the publicly-accessible online Cybernetics Portal Project brings together a “dispersed archival record,” hitherto available in the physical papers of Heinz von Foerster (University of Illinois Archives), Warren S. McCulloch (American Philosophical Society), W. Ross Ashby (British Library), and Norbert Wiener (MIT Institute Archives & Special Collections). Through the digitization of some of the correspondence and publications of these four ‘men of science,’ historical researchers of mid-twentieth century science have quick and facilitated access to the intellectual roots of cybernetics through a portion of their writings. Anderson does not address the gender dimensions surrounding the selection of correspondence and publications for digitization, only mentioning in passing that anthropologist Margaret Mead attended at least one Macy conference of cyberneticians. Mead is elsewhere characterized as the “founding mother of cybernetics,” a point the project’s digitization priorities could have reflected. [2]

Yet, as Anderson stresses, the project’s purpose went beyond mere digitization to involve the use of machine learning and natural language processing to accomplish three additional goals: (1) produce “machine generated” archival metadata; (2) create reusable data of a “thought collective” for future researchers; and (3) depict the “creation of scientific archives” through several types of text visualizations. The product of these efforts can be viewed online within the Cybernetics Portal Project, or manipulated offline by future researchers. As a result, Anderson argues, the digital humanities’ approach deployed in the portal could serve as a model for others who wish to explore different types of communities of people exchanging ideas through written correspondence, presumably including email, texts, and tweets.   

That said, the question of social utility of some aspects of the project must be addressed. What might the history of cybernetics tell us? It’s not clear whether a community of scholars clamored for the correspondence of these four particular cyberneticians to be digitized, analyzed by machines, and rendered accessible online. Moreover, Anderson’s statement that a list of cybernetics terms created by humans prior to the application of machine learning tools “did indeed anticipate much of the data that was machine generated from the materials,” or the caveat that machine learning tools worked best “on papers no longer than eight to ten pages,” does not encourage a great deal of confidence that the selected digital tools are worth the effort.

An additional constraint on future applications of these tools is the fact that the portal relies mainly on type-written 20th-century texts that could be easily subjected to OCR. The need to first transcribe manuscripts invariably slows text visualization research of handwritten correspondence, leading to a bias in visualization work towards already printed materials. [3] Finally, as Anderson readily admits, the suggestion that data sampling using natural language processing could “represent” the existence of a “community of provenance,” following the work of Jeanette Bastian and Emily Monks-Leeson, awaits additional projects to establish proof of a reimagined definition of provenance. [4]

Dr. Eric C. Stoykovich

College Archivist and Manuscript Librarian, Watkinson Library, Trinity College (Hartford, CT)

[1]  https://archives.library.illinois.edu/thought-collective/credits/

[2] “Founding mother of cybernetics” is cited in “Cybernetics of Cybernetics Competition: Context,” https://www.asc-cybernetics.org/CofC/   The closest explanation which the Cybernetics Portal Project offers for why these particular 4 scientists were chosen: “The Cybernetics ‘Thought Collective,’” https://archives.library.illinois.edu/thought-collective/cybernetics-thought-collective/  

[3] The project’s White Paper does concede that “while the vast majority of the materials are typewritten, handwritten correspondence can be found throughout the personal archives of the four cyberneticians; these handwritten materials will need to be transcribed in order to be processed for the analysis engine.” (page 4, https://www.ideals.illinois.edu/handle/2142/106050).

[4] Emily Monks-Leeson, “Archives on the Internet: Representing Contexts and Provenance from Repository to Website.” The American Archivist 1 April 2011; 74 (1): 38–57. doi: https://doi.org/10.17723/aarc.74.1.h386n333653kr83u

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Website Built with WordPress.com.

Up ↑

%d bloggers like this: