Corpora

The following are corpora used for the computational side of the project. The Agony Column corpus includes two decades worth of agony advertisements (from 1860 to 1879) transcribed from archival image files into raw text. The newspaper issue from which each text file is derived is noted in the file name. The Victorian Novels corpus includes 220 full length novels, some of which we classify as "newspaper novels" based on how scholars have referred to them in the past. At the head of each of these files is the novel’s corresponding metadata, which can also be found in the metadata spreadsheet.

Archival image files of The Times were sourced through The Times Digital Archive, a Gale-Cengage resource. Optical character recognition (OCR), a process of image-to-text conversion, was completed using Transkribus and with generous support from READ-COOP. We would like to offer thanks to Jean-Philippe Moreux of Gallica (Bibliothèque Nationale de France) for consulting on data extraction. Data on the Victorian novels was sourced from Project Gutenberg, txtlab's NOVEL450 data set, and HUM19UK.

Agony Columns

A corpus of 650k sentences from scraped from the Agony Column of The Times between 1860 and 1879.

Victorian Novels

25+ million words from a corpus of 220 Victorian novels (1800-1920).

Metadata for Novels


Bibliographies

The following are select bibliographies used over the course of the project, including primary and secondary source material.

Project Proposal Bibliography (2019)

Select Secondary Sources (2022)

Exhibition Items (2022)