Skip to Main Content

BIO 2V9R - Analyzing HPV Survey Data & Mining Twitter Content

Analyze HPV Survey Data


We will focus primarily on Cleaning and Organizing the survey content already collected.

We will also begin Exploring the content for patterns and trends through Data Visualizations (bar charts and scatter plots) and basic exploratory measurements (linear trends lines and correlations).

Key Excel Tools Used

Source: Forbes

  • Part 1: Cleaning Data (~20 minutes)
    • Power Query Editor (PQE)
      • PQE is Microsoft's data cleaning and transformation tool. We will use PQE to process the Likert scale text to ordinal values and split the Activities that can spread HPV.
      • Functions in the PQE that we will use:
        • Process Likert Columns
          • Add an Index column
          • Unpivot all Likert columns
          • Rename Attribute to Likert Questions and rename Values to Likert Values
          • Trim Likert Values
          • Enter Data and paste table from below
          • Rename pasted table to Ordinal
          • Trim the Text column of Ordinal table
          • Merge Ordinal to Table1
          • Select only the Ordinal column
          • Add Conditional Column named Intensity (ordinal greater then 3 = high, greater than 0 = low, else other)
        • Process Activities that can spread HPV Column
          • Split column by delimeter
          • Select the split columns and unpivot
          • Rename Values to Activities that can spread
          • Trim Activities that can spread
    • Close & Load, done with PQE
  • Part 2: Compare & Measure Proportions (~10 minutes)
    • Proportions with Slicers
      • Table1 - Insert Pivot Chart (be sure to click add this to the Data Model)
      • Drag Index to Values and adjust to Distinct Count
      • Drag Ordinal to Axis
      • Right-click Likert Questions and Add As Slicer
      • Probably will want to adjust placement and sizes of slicer and chart
      • Select a question in the Slicer to adjust the chart (we worked hard for this, hopefully it was worth the work?)
      • Right-click Activities that can spread and Add as Slicer
      • To see the Intensity instead of ordinal values, replace the Axis (categories) with Intensity
    • Analyzing Various Groups
      • Drag to Legend (e.g., race or gender columns)
  • Part 3: Scatter Plot (~10 minutes)
    • Create a Pivot Table from Table1 (be sure to click add this to the Data Model)
      • Likert Questions to columns
      • Ordinal to rows
      • Index as Distinct Coount to values
    • Create Scatter Plot using first two columns
    • Create slicer from Likert Questions to filter for two columns for scatter
    • Choose Quick Layout 9 for trend line, trend line formula, and r-squared
  • Part 4: Additional Statistical Measurements

We will use a dictionary table to code the original text values to the assigned ordinal values.

Text Ordinal
Strongly agree 7
Agree 6
Somewhat agree 5
Neither agree nor disagree 4
Somewhat disagree 3
Disagree 2
Strongly disagree 1
I am not religiously affiliated. 0
My religion has no teachings on sexual behavior. 0

Mine Content from Twitter

screenshot of NCapture workflow from

Steps to Collect Tweets as Data

  1. Install NCapture to your Chrome-based browser (Chrome, Edge, etc.):
  2. Navigate to and log in with any Twitter account:
  3. Search for a keyword or hashtag on
  4. Click the NCapture extension icon, and if this is the first time you are using the extension, accept any terms of service.

Options to Analyze Collected Tweets

  1. Use NVivo. NVivo is qualitative data analysis software designed to allow researchers to analyze qualitative content such as the text of Tweets. NVivo is installed in Jones 105. Any Baylor student canb download and install NVivo on their personal computer (Windows or Mac) from here:
  2. Use this custom Google Colab tool created by the Baylor University Libraries to process NCapture files without NVivo:

University Libraries

One Bear Place #97148
Waco, TX 76798-7148

(254) 710-6702