#barack <- read_csv("Data/tweets_potus.csv")
barack <- read_csv("https://proback.github.io/264_fall_2025/Data/tweets_potus.csv")
#michelle <- read_csv("Data/tweets_flotus.csv")
michelle <- read_csv("https://proback.github.io/264_fall_2025/Data/tweets_flotus.csv")
tweets <- bind_rows(barack |>
mutate(person = "Barack"),
michelle |>
mutate(person = "Michelle")) |>
mutate(timestamp = ymd_hms(timestamp))Mini-Project 1: Text Analysis
Overview
You will find a data set containing string data. This could be newspaper articles, tweets, songs, plays, movie reviews, or anything else you can imagine. Then you will answer questions of interest and tell a story about your data using skills you have developed in strings, regular expressions, and text analysis.
Your story must contain the following elements:
- at least 3 different str_ functions
- at least 3 different regular expressions
- at least 2 different text analysis applications (count words, bing sentiment, afinn sentiment, nrc sentiment, wordclouds, trajectories over sections or time, tf-idf, bigrams, correlations, networks, LDA, etc.). Note that many interesting insights can be gained by strategic and thoughtful use of regular expressions paired with simple counts and summary statistics.
- at least 3 illustrative, well-labeled plots or tables, one of which is described with alt-text using the Four Ingredients Model
- a description of what insights can be gained from your plots and tables. Be sure you weave a compelling and interesting story!
Be sure to highlight the elements above so that they are easy for me to spot!
You will hand in this project by pushing your work (qmd file, data set(s), and rendered pdf) to GitHub and providing me with a link. If you have a private repository, you might have to add me (proback) as a collaborator.
Evaluation Rubric
Available here.
Timeline
Mini-Project 1 must be submitted on Moodle by 11:00 PM on Wed Oct 1.
Topic Ideas
Obama tweets
President Barack Obama became the first US President with an official Twitter account, when @POTUS went live on May 18, 2015. (Yes, there was a time before Twitter/X.) First Lady Michelle Obama got in on Twitter much earlier, though her first tweet was not from @FLOTUS. All of the tweets from @POTUS and @FLOTUS are now archived on Twitter as @POTUS44 and @FLOTUS44, and they are available as a csv download from the National Archive. You can read more here.
Potential things to investigate:
- use of specific terms
- use of @, #, RT (retweet), or -mo (personal tweet from Michelle Obama)
- timestamp for date and time trends
- sentiment analysis
- anything else that seems interesting!
Dear Abby advice column
Read in the “Dear Abby” data underlying The Pudding’s 30 Years of American Anxieties article.
posts <- read_csv("https://raw.githubusercontent.com/the-pudding/data/master/dearabby/raw_da_qs.csv")Take a couple minutes to scroll through the 30 Years of American Anxieties article to get ideas for themes that you might want to search for and illustrate using regular expressions.
Other sources for string data
- Other articles from The Pudding
- NY Times headlines from the RTextTools package (see below)
- further analysis with the
bigspotifydata from class - Tidy Tuesday
- kaggle
- Data Is Plural
- the options are endless – be resourceful and creative!
library(RTextTools) # may have to install first
data(NYTimes)
as_tibble(NYTimes)# A tibble: 3,104 × 5
Article_ID Date Title Subject Topic.Code
<int> <fct> <fct> <fct> <int>
1 41246 1-Jan-96 Nation's Smaller Jails Struggle To C… Jails … 12
2 41257 2-Jan-96 FEDERAL IMPASSE SADDLING STATES WITH… Federa… 20
3 41268 3-Jan-96 Long, Costly Prelude Does Little To … Conten… 20
4 41279 4-Jan-96 Top Leader of the Bosnian Serbs Now … Bosnia… 19
5 41290 5-Jan-96 BATTLE OVER THE BUDGET: THE OVERVIEW… Battle… 1
6 41302 7-Jan-96 South African Democracy Stumbles on … politi… 19
7 41314 8-Jan-96 Among Economists, Little Fear on Def… econom… 1
8 41333 10-Jan-96 BATTLE OVER THE BUDGET: THE OVERVIEW… budget… 1
9 41344 11-Jan-96 High Court Is Cool To Census Change census… 20
10 41355 12-Jan-96 TURMOIL AT BARNEYS: THE DIFFICULTIES… barney… 15
# ℹ 3,094 more rows