Enroll in the Data Scholar Canvas course here!
Data Scripting: Python II: Leveraging Third-Party Libraries
This workshop will cover importing libraries and other packages into Python scripts using Jupyter Notebooks on Google's Colaboratory platform.
Participants will work through a series of exercises in the following:
(1) Take Workshops, (2) Pass Quizzes, (3) Become a Data Scholar
Interested in becoming a Data Scholar?
Takes only six workshops! |
Pick any Two Categories Below, Take at Least Two Workshops from Each of Those Categories: (Total of 4)
|
|
Pick any One Category Below, Take at Least Two Workshops from That Category:
(Total of 2)
|
* Becoming a Data Scholar is not mandatory. Take any workshop you like.
Head to Google Colaboratory
|
https://colab.research.google.com |
Click New Python 3 Notebook
If you do not have this popup, click File/New Python 3 Notebook |
Common pip commands:
!pip freeze Displays all installed modules and their version |
|
!pip search twitter | |
import math help(math) |
|
To call a specific function... In a new code cell: Type math. After math, type a dot. If nothing happens, try hitting the tab key. select math.pi run cell |
|
help(math.degrees) | |
QUIZ: How many degrees are 5 radians? |
Head to VADER Sentiment GitHub site |
|
Install VADER Sentiment | |
Import VADER Sentiment help(SentimentIntensityAnalyzer) |
|
sentences=['I hope everyone has a fantastic day!','I hate when Baylor loses football.'] from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer analyzer = SentimentIntensityAnalyzer() for sentence in sentences: vs = analyzer.polarity_scores(sentence) print(vs) |
Pandas |
|
Import Pandas | import pandas as pd |
Download practice csv table https://researchguides.baylor.edu/ld.php?content_id=51320019 |
|
Upload to Colab - Click the little tab on the left - Click Files and upload practice_data.csv |
|
In cell block, type pd.read and pause a moment until the list of structured file types are listed. (Make sure you run import pd first.) pd.read_csv('practice_data.csv') |
|
Store table in a dataframe | |
Isolate a dataframe column | |
Plot column_a by column_b
plt.plot(practice_table.column_a,practice_table.column_b) |
|
To add another plot, column a by column c:
Copy the plt.plot line and paste beneath it and change b to c. |
|
Quiz: Add a title to this plot |
Compare population growth between the U.S. and China.
Download world_pop.csv https://researchguides.baylor.edu/ld.php?content_id=51320021
Upload to Colab |
|
world_data=pd.read_csv('world_pop.csv')
To include all rows:
world_data=pd.read_csv('world_pop.csv') |
|
Create a dataframe containing only data for the United States
us=world_data[world_data.country=='United States'] |
|
Quiz: Create a dataframe containing only data for China called china |
|
Plot U.S. population by year | |
Notice the scientific notation? Adjust by dividing population by 1 million. | plt.plot(us.year,us.population / 1000000) plt.show() |
Add China population by year to our plot | |
Complete plot with title, labels, and legend.
plt.plot(us.year,us.population / 1000000) |
|
Adjust to show % population growth per year instead of raw population counts.
|
|
New Code Cell us.population
|
|
Values can be queried by index number location using iloc[]
|
|
Divide each year of U.S. population by the first year's population, times 100 to calculate percent. | |
In previous code cell, adjust plt.plot lines to show % change instead of raw counts.
plt.plot(us.year,us.population / us.population.iloc[0] * 100) |
Natural Language Processing with spaCy |
|
Download trump2019sotu.txt https://researchguides.baylor.edu/ld.php?content_id=51320244 |
|
Create a new Python 3 Notebook
and Upload to Colab |
|
Read text file into address variable | f=open('trump2019sotu.txt','r') address=f.read() f.close() print(address) |
Create a new code cell
Import spacy and create an NLP object from address that we can run NLP functions agains. |
import spacy nlp=spacy.load('en') doc=nlp(address) |
print lemma form of each word
for token in doc: |
|
To see parts of speech instead of lemmas, change token.lemma_ to token.pos_
other options:
|
|
Quiz: Adjust the above code cell to show whether a word is a stop word or not |
Create a structured list of words, parts of speech, and stop word status
import spacy for token in doc: word_list |
|
New code cell
Convert list to a pandas dataframe
import pandas as pd |
|
Write to csv table | |
Calculate token frequencies | word_table['text'].value_counts() |
Walk-Through Jupyter Notebook Here:
https://colab.research.google.com/drive/1pnXSZCvd_qjwZxyERQ10CIau8ZiMjKo6
Calculate the Flesch Reading Ease Score of Trump's 2019 State of the Union Address.
f=open('trump2019sotu.txt','r')
address=f.read()
f.close()
Copyright © Baylor® University. All rights reserved.
Report It | Title IX | Mental Health Resources | Anonymous Reporting | Legal Disclosures