A hands-on text data mining (TDM) workshop using Baylor's Armstrong Browning Library's Victorian Collection
Workshop Steps:
|
|
The Armstrong Browning Library is home to the world's largest collection of Robert Browning and Elizabeth Barrett Browning research resources. Robert Browning, May 7, 1812 – December 12, 1889, is the British poet credited with creating and popularizing the dramatic monolog form of poetry. He was so popular that Browning Societies dedicated to gathering together to read and discuss his work began during his lifetime and continue to this day. Robert Browning was married to Elizabeth Barrett Browning, March 6, 1806- June 29, 1861, one of the foremost British poet of the 19th Century. A. J. Armstrong was a Robert Browning scholar and Chair of Baylor's English Department from 1912-1952. In 1918, Armstrong donated his personal library of books and periodicals by and about Robert Browning to Baylor University Library. He continued to gather together all possible items of interest in connection with Robert Browning for an intensive or extensive study of the poet into Baylor's Browning Collection. When the collection outgrew its home in Carroll Library, Armstrong undertook fundraising to build a library specifically for Baylor's Browning Collection. Construction on the Armstrong Browning Library completed in 1951. |
The Victorian Collection includes more than 8,000 letters and manuscripts by or to Browning family members or other prominent, as well as less known, British and American figures. The Armstrong Browning Library acquired some of these items because of either the author's or recipient’s (intended audience’s) connection to the Brownings. In many instances there was a single Browning resource included as part of a group of 19th century items. The collection includes letters and manuscripts from many notable nineteenth-century authors such as Charles Dickens, William Wordsworth, Samuel Taylor Coleridge, Thomas Carlyle, John Henry Newman, George MacDonald, and John Ruskin. The collection also includes letters and manuscripts from political figures, religious leaders, scientists, artists, art collectors, and explorers. To increase awareness of the Victorian Collection, the Armstrong Browning Library has digitized more than 3,000 of the Victorian Collection’s letters and manuscripts |
|
Download Victorian Collection Workshop Data Here
What is Metadata?
The Victorian Collection metadata contains Descriptive, Structural, and Administrative metadata. The metadata also include the full text, where digitized. |
Simply, metadata is information about a dataset. |
The victorian_table_raw.csv contains the data extract from the Baylor University Libraries Digital Collections.
Each row represents a document page. |
|
Descriptive Fields |
|
Structural Fields |
|
Administrative Fields |
|
How documents were transcribed... | Student workers manually transcribing pages. |
Seven Broad Text Data Mining Workflow Procedures * Workshop focuses on highlighted items
|
This step is optional: Follow along or just watch
Text Data Mining Procedures Covered in this Section:
Python Script Using the Following Libraries: |
Click the image below to launch Google Colaboratory |
Voyant home screen accepts uploads in a variety of languages:Arabic, Bosnian, Croation, Czech, English, French, Hebrew, Italian, Japanese, Portuguese, and Serbian;Auto-Detect is defaultand a variety of formats:TXT, HTML, XML, PDF, RTF, MS Word, ZIP |
|
5 Voyant "Skins": the default are:Cirrus - word cloudReading - text being analyzedTrends - top keywords visualized across 10 equal segments of textSummary - key pointsContexts - keyword plus 5 words to either side |
|
Available options show on mouse over of upper right of each skin |
Visualization URL / Change Tool in this Skin / Options / HelpAvailable skin view / Available skin view / Current skin view / Help |
Editing the Stop Word List:Review the words showing in the word cloud to identify any you want to eliminate (you may repeat this several times during your analysis process)Choose the Options button in the Cirrus skin;Check that language is either Auto-Detect or the language you are analyzingAdd your chosen stop words, one per line |
|
Corpus:One or several texts saved as a continuous document in a single file will be analyzed as one continuous document |
|
Corpora:Several texts saved individually in a single file will be analyzed as individual textsThe example to the right is for an analysis of Tom Sawyer, Huck Finn, and The Prince and the Pauper as a corpora |
Go to voyant-tools.org and upload the filevictorian_transcribed_no_metadata |
|
Adding to the Stop Word List
|
|
Let's make a static version of the Cirrus word cloud:
|
|
Changing Skins: Identifying Collocates
|
|
Examine Trends for Entertainment
|