How will Texas counties vote during the 2020 Presidential election?
Description
This hands-on workshop will guide participants through the process of constructing a decision tree to model 2016 votes for Donald Trump in Texas using various demographic attributes.
Using 2016 data, we will train our model and then test our model to measure its accuracy.
Using 2020 forecast data, we will use our model to predict 2020 votes for Donald Trump.
Learning Outcomes
Participants will be able to solve basic classification problems by constructing a basic decision tree model.
Participants will understand the basics of decision tree modeling.
Participants will be able to construct a workflow using the Knime Analytics Platform.
The Baylor Libraries’ Data Scholar Program provides a series of hands-on workshops designed to help Baylor researchers learn about
advanced data research methods, tools, and sources.
Data can be Text, Numbers, and Multimedia
Knime stands for Konstanz Information Miner.
An open source data analytics package available for Windows, Mac, and Linux.
Drag & Drop graphical interface to assemble nodes for data processing, analysis, modeling, and visualization.
Download at https://www.knime.com/
Variables included in the workshop dataset. All variables are on the voter tabulation district level for Texas.
2016 election data collected from the Texas Legislative Council Public FTP Server. 2016 demographic data collected from the American Community Survey (2012-2016) via IPUMS NHGIS. 2020 demographic forecasts collected from Simply Analytics (Baylor only).
QGIS Desktop was used to aggregate the block-group ACS data to the VTD level.
Step #1: Launch the Knime application |
|
Step #2: Explore Interface
Take a moment to explore the following components of the Knime interface:
|
|
Step #3: Create a new workflow window
Right-click LOCAL (Local Workspace) and select New KNIME Workflow
|
|
Step #4: Name the workflow
Name: Decision Tree Trump Votes
Click OK |
Step #1: Import Excel data
In the Node Repository, expand IO and expand Read. Drag Excel Reader (XLS) to Workflow Window. |
|
Step #2: Configure Excel Reader Node
Double-click Excel Reader Node (or right-click/Configure)
OK Traffic light should turn from red to yellow. |
Step #1: Add bin node
Search Node Repository for bin Drag Auto-Binner node to Workflow. |
|
Step #2: Connect nodes
Drag from the tip of the Excel Reader node output to the edge of the Auto-Binner node input. |
|
Step #3: Clean workflow
On the top toolbar, click the Open setting dialog button.
On the top toolbar, click the Auto Layout button. |
|
Step #4: Configure Bin node
Double-click Auto-Binner node.
OK |
Step #1: Add partition node
Search Node Repository for partition Drag Partitioning node to Workflow. |
|
Step #2: Connect Auto-Binner node to Partitioning node | |
Step #3: Configure partitioning node
Double-click Partitioning node
OK
All traffic lights should be at yellow. |
Step #1: Add Decision tree learner node
Search Node Repository for decision. Drag Decision Tree Learner node to Workflow. |
|
Step #2: Connect Topmost Partitioning node to Decision Tree Learner node | |
Step #3: Configure decision tree learner node
Double-click Decision Tree Learner node
OK |
Step #1: Add Decision tree view node
Search Node Repository for decision. Drag Decision Tree View (JavaScript) node to Workflow. |
|
Step #2: Connect Decision Tree Learner node blue output to Decision Tree View (JavaScript) node blue input | |
Step #3: Run model
On the top toolbar, click the Execute all executable nodes button. |
|
Step #4: View Decision tree
Right-click the Decision Tree View (JavaScript) node and select Interactive View: Decision Tree View |
Step #1: Add Decision predictor node
Search Node Repository for decision. Drag Decision Tree Predictor node to Workflow. |
|
Step #2: Connect Partitioning node bottom output to Decision Tree Predictor input node. Also... Connect Decision Tree Learner node blue output to Decision Tree Predictor node blue input |
Step #1: Add Scorer node
Search Node Repository for scorer. Drag Scorer node to Workflow. |
|
Step #2: Connect Decision tree predictor node to Scorer input node. |
|
Step #3: Configure scorer node
Double-click the scorer node
OK |
|
Step #4: Run Model
On the top toolbar, click the Execute all executable nodes button. |
|
Step #5: Measure accuracy
Right-click the Scorer node and select View: Confusion Matrix |
Question: | If we adjusted the number of classes (bins) from 3 to 5, would the accuracy of the model increase or decrease? |
Task: | Make it so and see if you are correct! |
Add a new Excel Reader (XLS) node
Configure node |
|
Add a new Decision Tree Predictor node
Connect the newly added Excel Reader (XLS) node to this Decision Tree Predictor node.
Connect the Decision Tree Learner node to the newly added Decision Tree Predictor node.
Run model |
|
Right-click the newly added Excel Reader (XLS) node and select Classified Data. |
Copyright © Baylor® University. All rights reserved.
Report It | Title IX | Mental Health Resources | Anonymous Reporting | Legal Disclosures