Skip to Main Content

Data & Digital Scholarship Tutorials

Workshop Description

Mining Social Media; Twitter, Reddit, Instagram

This workshop will guide researchers through mining Twitter data using NCapture and NVivo, Reddit content using PRAW, and Instagram content by webscraping.

Social Media Workshop

Click Here for Workshop Schedule

(1) Take Workshops, (2) Pass Quizzes, (3) Become a Data Scholar

Interested in becoming a Data Scholar?


Takes only six workshops!

Pick any Two Categories Below, Take at Least Two Workshops from Each of Those Categories: (Total of 4)


  • Data Visualization
  • Text Data Mining
  • Python Data Scripting
Pick any One Category Below, Take at Least Two Workshops from That Category:


(Total of 2)

  • Research Data Management
  • Finding Secondary Data


* Workshops are offered every semester. No need to fit all 6 in one semester. Become a Data Scholar at your own pace.

* Becoming a Data Scholar is not mandatory. Take any workshop you like.

Workshop participants will use NCapture to extract recent Twitter content. We will then import the content into NVivo, where we will explore maps, wordclouds, and themes.

Recent Tweets (5-7 days, up to 18,000 Tweets)

Advised Method


Free Chrome extension that allows users to mine recent Twitter data. Tweets that include a particular word, phrase or hashtag, or Tweets by a particular user.

Data is stored in Chrome's default download directory.

NVivo  is required to access and analyze the data. The advantage to NCapture is the ability to analyze the data using NVivo.

Recent Tweets (5-7 days, up to 18,000 Tweets)

Alternative Method




Basic (free) version allows up to 2,000 Tweets.
Pro (paid) version allows up to 18,000 Tweets.
The advantage to NodeXL is its ability to quickly allow researchers to explore network graphs and relationships.

Historic Twitter Data

Archive Team: The Twitter Stream Grab

“A simple collection of JSON grabbed from the general twitter stream, for the purposes of research, history, testing and memory. This is the “Spritzer” version, the most light and shallow of Twitter grabs. Unfortunately, we do not currently have access to the Sprinkler or Garden Hose versions of the stream.”

Monthly archives are compressed tarballs (.tar), containing hourly Tweet archives compressed as bzip2 files (.bz2). Uncompressed archives are in the standard Twitter JSON format, and contain all fields.

Baylor Libraries Python Script to Mine Archive Team: The Twitter Stream Grab content

Download Tool

Requirements: Anaconda Python

Video Walk-Through: Stream Video (no audio)

The Archive Team: The Twitter Stream Grab provides historic downloads of Twitter archives by month. This script helps researchers to mine this content for a list of
words, phrases, or hashtags. This script requires the monthly archives to be downloaded and extracted from the .tar archive before use.

Output is a .csv file containing one record per relationship. Relationships are classified as either (1) reply, (2) mention, or (3) tweet. A reply is a direct response to another user's post. A mention is where another user is mentioned, but not a diret reply. A tweet relationship are tweets with neither no replies or mentions.

See the modify section below to specify (1) keywords/hashtags, (2) top-level directory, and (3) output file name.


Since 2016, Facebook has locked down much of their content, making it difficult to mine for research purposes.

What is accessible?


Workshop participants will mine Reddit text and images by subreddit.


Mine Reddit Subgroups - Tool created by Baylor University Libraries


Cheat Code:

Participants will mine Instagram images by hashtag



Mine Instagram by Hashtag - Tool created by Baylor University Libraries

University Libraries

One Bear Place #97148
Waco, TX 76798-7148

(254) 710-6702