The HathiTrust Research Center (HTRC) provides tools and access to the text of the works in the HathiTrust catalog to enable new forms of scholarly research that is driven by text mining and other computational tools that analyze a large corpus of text. Originally these functions were limited to works in the public domain, however in late September 2018 HathiTrust changed their policy and released an updated HTRC Analytics so they now provide access to the text of the complete 16.7-million-item HathiTrust corpus for non-consumptive research, such as data mining and computational analysis, including items protected by copyright. Functions included in this policy change include:
- HTRC Algorithms: A set of click-and-run tools to perform computational text analysis on volumes in the HathiTrust Digital Library. The algorithms can enable exploration, analysis, and visualization of public work sets or those created by the researcher..
- Extracted Features Dataset: Research datasets that allow non-consumptive analysis on specific features extracted from the full text of the HathiTrust corpus.
- HathiTrust+Bookworm: A tool that enables visualization and analysis of word usage trends in the HathiTrust corpus.
- HTRC Data Capsule: A system of a secure computing environments for performing researcher-driven text analysis on the HathiTrust corpus. All users may access public domain items. Access to copyrighted items in a Capsule is available ONLY to HathiTrust member-affiliated researchers.
An updated chart on tool access provides additional details. More information on using the HTRC is provided in their Getting Started Guide.