Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Data & Digital Scholarship Tutorials

Procedures

How will Texas counties vote during the 2020 Presidential election?

Description

This hands-on workshop will guide participants through the process of constructing a decision tree to model 2016 votes for Donald Trump in Texas using various demographic attributes.

Using 2016 data, we will train our model and then test our model to measure its accuracy.

Using 2020 forecast data, we will use our model to predict 2020 votes for Donald Trump.

 

Learning Outcomes

Participants will be able to solve basic classification problems by constructing a basic decision tree model.

Participants will understand the basics of decision tree modeling.

Participants will be able to construct a workflow using the Knime Analytics Platform.

 

The Baylor Libraries’ Data Scholar Program provides a series of hands-on workshops designed to help Baylor researchers learn about

advanced data research methods, tools, and sources.

Data can be Text, Numbers, and Multimedia

 
Assessment/Feedback: https://baylor.qualtrics.com/jfe/form/SV_aghfDPWozuY9dJP

Knime stands for Konstanz Information Miner.

An open source data analytics package available for Windows, Mac, and Linux.

Drag & Drop graphical interface to assemble nodes for data processing, analysis, modeling, and visualization.

Download at https://www.knime.com/

 

 

Variables included in the workshop dataset. All variables are on the voter tabulation district level for Texas.

  1. % Votes for Donald Trump in the 2016 presidential election
  2. Population per square mile (2016, 2020)
  3. % White population (2016, 2020)
  4. % African American population (2016, 2020)
  5. % Hispanic population (2016, 2020)
  6. % Adults with bachelor's degree (2016, 2020)
  7. Median household income (2016, 2020)
  8. County

2016 election data collected from the Texas Legislative Council Public FTP Server. 2016 demographic data collected from the American Community Survey (2012-2016) via IPUMS NHGIS. 2020 demographic forecasts collected from Simply Analytics (Baylor only).

QGIS Desktop was used to aggregate the block-group ACS data to the VTD level.

Step #1: Launch the Knime application

 

Step #2: Explore Interface

 

Take a moment to explore the following components of the Knime interface:

  • Knime Explorer
  • Node Repository
  • Node Description
  • Workflow
  • Outline
  • Console

Step #3: Create a new workflow window

 

Right-click LOCAL (Local Workspace) and select New KNIME Workflow

 

Step #4: Name the workflow

 

Name: Decision Tree Trump Votes

 

Click OK

 

Step #1: Import Excel data

 

In the Node Repository, expand IO and expand Read.

Drag Excel Reader (XLS) to Workflow Window.

Step #2: Configure Excel Reader Node

 

Double-click Excel Reader Node (or right-click/Configure)

  1. Select file to read: Browse for Excel file
  2. Select the sheet to read: <first sheet with data>
  3. Table contains column names: Check
  4. Row IDs: Table contains row IDs in column A
  5. Under Preview: Click Reload

OK

Traffic light should turn from red to yellow.

 

Step #1: Add bin node

 

Search Node Repository for bin

Drag Auto-Binner node to Workflow.

Step #2: Connect nodes

 

Drag from the tip of the Excel Reader node output to the edge of the Auto-Binner node input.

Step #3: Clean workflow

 

On the top toolbar, click the Open setting dialog button.

  • Under Node Connections, check Curved connections.
  • OK

 

On the top toolbar, click the Auto Layout button.

Step #4: Configure Bin node

 

Double-click Auto-Binner node.

  1. All attributes in the Exclude left window except for Trump. Trump remains in the Include right window.
  2. Binning Method: Fixed number of bins: 3
  3. Bin Naming: Borders
  4. Replace target column(s): Check

 

OK

 

Step #1: Add partition node

 

Search Node Repository for partition

Drag Partitioning node to Workflow.

Step #2: Connect Auto-Binner node to Partitioning node

Step #3: Configure partitioning node

 

Double-click Partitioning node

  1. Choose size of first partition: Relative[%]
  2. Relative[%]: 80

OK

 

All traffic lights should be at yellow.

 

Step #1: Add Decision tree learner node

 

Search Node Repository for decision.

Drag Decision Tree Learner node to Workflow.

Step #2: Connect Topmost Partitioning node to Decision Tree Learner node

Step #3: Configure decision tree learner node

 

Double-click Decision Tree Learner node

  1. Class column: Trump
  2. Quality Measure: Gini Index
  3. Pruning Method: MDL (minimum distance length)

 

OK

 

Step #1: Add Decision tree view node

 

Search Node Repository for decision.

Drag Decision Tree View (JavaScript) node to Workflow.

Step #2: Connect Decision Tree Learner node blue output to Decision Tree View (JavaScript) node blue input

Step #3: Run model

 

On the top toolbar, click the Execute all executable nodes button.

Step #4: View Decision tree

 

Right-click the Decision Tree View (JavaScript) node and select Interactive View: Decision Tree View

 

Step #1: Add Decision predictor node

 

Search Node Repository for decision.

Drag Decision Tree Predictor node to Workflow.

Step #2: Connect Partitioning node bottom output to Decision Tree Predictor input node.

Also...

Connect Decision Tree Learner node blue output to Decision Tree Predictor node blue input

 

Step #1: Add Scorer node

 

Search Node Repository for scorer.

Drag Scorer node to Workflow.

Step #2: Connect Decision tree predictor node to Scorer input node.

Step #3: Configure scorer node

 

Double-click the scorer node

  • First column: Trump
  • Second column: Prediction (Trump)

 

OK

Step #4: Run Model

 

On the top toolbar, click the Execute all executable nodes button.

Step #5: Measure accuracy

 

Right-click the Scorer node and select View: Confusion Matrix

 

Question: If we adjusted the number of classes (bins) from 3 to 5, would the accuracy of the model increase or decrease?
Task: Make it so and see if you are correct!

 

Add a new Excel Reader (XLS) node

 

Configure node

Add a new Decision Tree Predictor node

 

Connect the newly added Excel Reader (XLS) node to this Decision Tree Predictor node.

 

Connect the Decision Tree Learner node to the newly added Decision Tree Predictor node.

 

Run model

Right-click the newly added Excel Reader (XLS) node and select Classified Data.

 

Voter Tabulation Districts

There are 8,941 VTDs in Texas.

(8,654 > 0 votes cast)

University Libraries

One Bear Place #97148
Waco, TX 76798-7148

(254) 710-6702