Data Acquisition with APIs in R

You can download this .qmd file from here. Just hit the Download Raw File button.

Credit to Brianna Heggeseth and Leslie Myint from Macalester College for a few of these descriptions and examples.

Getting data from websites

Option 1: APIs

When we interact with sites like The New York Times, Zillow, and Google, we are accessing their data via a graphical layout (e.g., images, colors, columns) that is easy for humans to read but hard for computers.

An API stands for Application Programming Interface, and this term describes a general class of tool that allows computers, rather than humans, to interact with an organization’s data. How does this work?

When we use web browsers to navigate the web, our browsers communicate with web servers using a technology called HTTP or Hypertext Transfer Protocol to get information that is formatted into the display of a web page.
Programming languages such as R can also use HTTP to communicate with web servers. The easiest way to do this is via Web APIs, or Web Application Programming Interfaces, which focus on transmitting raw data, rather than images, colors, or other appearance-related information that humans interact with when viewing a web page.

A large variety of web APIs provide data accessible to programs written in R (and almost any other programming language!). Almost all reasonably large commercial websites offer APIs. Todd Motto has compiled an expansive list of Public Web APIs on GitHub, although it’s about 3 years old now so it’s not a perfect or complete list. Feel free to browse this list to see what data sources are available.

For our purposes of obtaining data, APIs exist where website developers make data nicely packaged for consumption. The language HTTP (hypertext transfer protocol) underlies APIs, and the R package httr() (and now the updated httr2()) was written to map closely to HTTP with R. Essentially you send a request to the website (server) where you want data from, and they send a response, which should contain the data (plus other stuff).

The case studies in this document provide a really quick introduction to data acquisition, just to get you started and show you what’s possible. For more information, these links can be somewhat helpful:

https://towardsdatascience.com/functions-with-r-and-rvest-a-laymens-guide-acda42325a77
https://nceas.github.io/oss-lessons/data-liberation/intro-webscraping.html

Wrapper packages

In R, it is easiest to use Web APIs through a wrapper package, an R package written specifically for a particular Web API.

The R development community has already contributed wrapper packages for many large Web APIs (e.g. ZillowR, rtweet, genius, Rspotify, tidycensus, etc.)
To find a wrapper package, search the web for “R package” and the name of the website. For example:
- Searching for “R Reddit package” returns RedditExtractor
- Searching for “R Weather.com package” returns weatherData
rOpenSci also has a good collection of wrapper packages.

In particular, tidycensus is a wrapper package that makes it easy to obtain desired census information for mapping and modeling:

Warning: • You have not set a Census API key. Users without a key are limited to 500
queries per day and may experience performance limitations.
ℹ For best results, get a Census API key at
http://api.census.gov/data/key_signup.html and then supply the key to the
`census_api_key()` function to use it throughout your tidycensus session.
This warning is displayed once per session.


  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |===                                                                   |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |=======                                                               |  11%
  |                                                                            
  |========                                                              |  12%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |==============                                                        |  19%
  |                                                                            
  |===============                                                       |  21%
  |                                                                            
  |================                                                      |  22%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |===================                                                   |  28%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |======================                                                |  31%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |===========================                                           |  39%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |=============================                                         |  41%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |==================================                                    |  48%
  |                                                                            
  |===================================                                   |  49%
  |                                                                            
  |====================================                                  |  52%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |==============================================                        |  65%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |===============================================                       |  68%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |====================================================================  |  98%
  |                                                                            
  |======================================================================| 100%

Obtaining raw data from the Census Bureau was that easy! Often we will have to obtain and use a secret API key to access the data, but that’s not always necessary with tidycensus.

Now we can tidy that data and produce plots and analyses.

# Rename cryptic variables from the census form
sample_acs_data <- sample_acs_data |>
  rename(population = B01003_001E,
         population_moe = B01003_001M,
         median_income = B19013_001E,
         median_income_moe = B19013_001M)

# Plot with geom_sf since our data contains 1 row per census tract
#   with its geometry
ggplot(data = sample_acs_data) + 
  geom_sf(aes(fill = median_income), colour = "white", linetype = 2) + 
  theme_void()

# The whole state of MN is overwhelming, so focus on a single county
sample_acs_data |>
  filter(str_detect(NAME, "Hennepin")) |>
  ggplot() + 
    geom_sf(aes(fill = median_income), colour = "white", linetype = 2)

# Look for relationships between variables with 1 row per tract
as_tibble(sample_acs_data) |>
  ggplot(aes(x = population, y = median_income)) + 
    geom_point() + 
    geom_smooth(method = "lm")

Extra resources:

tidycensus: wrapper package that provides an interface to a few census datasets with map geometry included!
- Full documentation is available at https://walker-data.com/tidycensus/
censusapi: wrapper package that offers an interface to all census datasets
- Full documentation is available at https://www.hrecht.com/censusapi/

get_acs() is one of the functions that is part of tidycensus. Let’s explore what’s going on behind the scenes with get_acs()…

Accessing web APIs directly

Getting a Census API key

Many APIs (and their wrapper packages) require users to obtain a key to use their services.

This lets organizations keep track of what data is being used.
It also rate limits their API and ensures programs don’t make too many requests per day/minute/hour. Be aware that most APIs do have rate limits — especially for their free tiers.

Navigate to https://api.census.gov/data/key_signup.html to obtain a Census API key:

Organization: St. Olaf College
Email: Your St. Olaf email address

You will get the message:

Your request for a new API key has been successfully submitted. Please check your email. In a few minutes you should receive a message with instructions on how to activate your new key.

Check your email. Copy and paste your key into a new text file:

File > New File > Text File (towards the bottom of the menu)
Save as census_api_key.txt in the same folder as this .qmd.

You could then read in the key with code like this:

myapikey <- readLines("~/264_fall_2024/DS2_preview_work/census_api_key.txt")

Handling API keys

While this works, the problem is once we start backing up our files to GitHub, your API key will also appear on GitHub, and you want to keep your API key secret. Thus, we might use environment variables instead:

One way to store a secret across sessions is with environment variables. Environment variables, or envvars for short, are a cross platform way of passing information to processes. For passing envvars to R, you can list name-value pairs in a file called .Renviron in your home directory. The easiest way to edit it is to run:

file.edit("~/.Renviron")

The file looks something like

PATH = “path” VAR1 = “value1” VAR2 = “value2” And you can access the values in R using Sys.getenv():

Sys.getenv("VAR1")
#> [1] "value1"

Note that .Renviron is only processed on startup, so you’ll need to restart R to see changes.

Another option is to use Sys.setenv and Sys.getenv:

# I used the first line to store my CENSUS API key in .Renviron
#   after uncommenting - should only need to run one time
# Sys.setenv("CENSUS_KEY" = "my census api key pasted here")
# my_census_api_key <- Sys.getenv("CENSUS_KEY")

Navigating API documentation

Navigate to the Census API user guide and click on the “Example API Queries” tab.

Let’s look at the Population Estimates Example and the American Community Survey (ACS) Example. These examples walk us through the steps to incrementally build up a URL to obtain desired data. This URL is known as a web API request.

https://api.census.gov/data/2019/acs/acs1?get=NAME,B02015_009E,B02015_009M&for=state:*

https://api.census.gov: This is the base URL.
- http://: The scheme, which tells your browser or program how to communicate with the web server. This will typically be either http: or https:.
- api.census.gov: The hostname, which is a name that identifies the web server that will process the request.
data/2019/acs/acs1: The path, which tells the web server how to get to the desired resource.
- In the case of the Census API, this locates a desired dataset in a particular year.
- Other APIs allow search functionality. (e.g., News organizations have article searches.) In these cases, the path locates the search function we would like to call.
?get=NAME,B02015_009E,B02015_009M&for=state:*: The query parameters, which provide the parameters for the function you would like to call.
- We can view this as a string of key-value pairs separated by &. That is, the general structure of this part is key1=value1&key2=value2.

key	value
get	NAME,B02015_009E,B02015_009M
for	state:*

Typically, each of these URL components will be specified in the API documentation. Sometimes, the scheme, hostname, and path (https://api.census.gov/data/2019/acs/acs1) will be referred to as the endpoint for the API call.

We will first use the httr2 package to build up a full URL from its parts.

request() creates an API request object using the base URL
req_url_path_append() builds up the URL by adding path components separated by /
req_url_query() adds the ? separating the endpoint from the query and sets the key-value pairs in the query
- The .multi argument controls how multiple values for a given key are combined.
- The I() function around "state:*" inhibits parsing of special characters like : and *. (It’s known as the “as-is” function.)
- The backticks around for are needed because for is a reserved word in R (for for-loops). You’ll need backticks whenever the key name has special characters (like spaces, dashes).
- We can see from here that providing an API key is achieved with key=YOUR_API_KEY.

# Request total number of Hmong residents and margin of error by state
#   in 2019, as in the User Guide
CENSUS_API_KEY <- Sys.getenv("CENSUS_API_KEY")
req <- request("https://api.census.gov") |> 
    req_url_path_append("data") |> 
    req_url_path_append("2019") |> 
    req_url_path_append("acs") |> 
    req_url_path_append("acs1") |> 
    req_url_query(get = c("NAME", "B02015_009E", "B02015_009M"), `for` = I("state:*"), key = CENSUS_API_KEY, .multi = "comma")

Why would we ever use these steps instead of just using the full URL as a string?

To generalize this code with functions! (This is exactly what wrapper packages do.)
To handle special characters
- e.g., query parameters might have spaces, which need to be represented in a particular way in a URL (URLs can’t contain spaces)

Once we’ve fully constructed our request, we can use req_perform() to send out the API request and get a response.

resp <- req_perform(req)
resp

We see from Content-Type that the format of the response is something called JSON. We can navigate to the request URL to see the structure of this output.

JSON (Javascript Object Notation) is a nested structure of key-value pairs.
We can use resp_body_json() to parse the JSON into a nicer format.
- Without simplifyVector = TRUE, the JSON is read in as a list.

resp_json_list <- resp |> resp_body_json()
head(resp_json_list, 2)

[[1]]
[[1]][[1]]
[1] "NAME"

[[1]][[2]]
[1] "B02015_009E"

[[1]][[3]]
[1] "B02015_009M"

[[1]][[4]]
[1] "state"


[[2]]
[[2]][[1]]
[1] "Illinois"

[[2]][[2]]
[1] "655"

[[2]][[3]]
[1] "511"

[[2]][[4]]
[1] "17"

resp_json_df <- resp |> resp_body_json(simplifyVector = TRUE)
head(resp_json_df)

     [,1]       [,2]          [,3]          [,4]   
[1,] "NAME"     "B02015_009E" "B02015_009M" "state"
[2,] "Illinois" "655"         "511"         "17"   
[3,] "Georgia"  "3162"        "1336"        "13"   
[4,] "Idaho"    NA            NA            "16"   
[5,] "Hawaii"   "56"          "92"          "15"   
[6,] "Indiana"  "1344"        "1198"        "18"

resp_json_df <- janitor::row_to_names(resp_json_df, 1)
head(resp_json_df)

     NAME       B02015_009E B02015_009M state
[1,] "Illinois" "655"       "511"       "17" 
[2,] "Georgia"  "3162"      "1336"      "13" 
[3,] "Idaho"    NA          NA          "16" 
[4,] "Hawaii"   "56"        "92"        "15" 
[5,] "Indiana"  "1344"      "1198"      "18" 
[6,] "Iowa"     "685"       "705"       "19"

All right, let’s try this! First we’ll grab total population and median household income for all census tracts in MN using 3 approaches

# First using tidycenus
library(tidycensus)
sample_acs_data <- tidycensus::get_acs(
    year = 2021,
    state = "MN",
    geography = "tract",
    variables = c("B01003_001", "B19013_001"),
    output = "wide",
    geometry = TRUE,
    county = "Hennepin",   # specify county in call
    show_call = TRUE       # see resulting query
)


  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |===========================                                           |  39%
  |                                                                            
  |=====================================                                 |  52%
  |                                                                            
  |==============================================                        |  66%
  |                                                                            
  |=====================================================                 |  75%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |======================================================================| 100%

# Next using httr2
req <- request("https://api.census.gov") |> 
    req_url_path_append("data") |> 
    req_url_path_append("2020") |> 
    req_url_path_append("acs") |> 
    req_url_path_append("acs5") |> 
    req_url_query(get = c("NAME", "B01003_001E", "B19013_001E"), `for` = I("tract:*"), `in` = I("state:27"), `in` = I("county:053"), key = CENSUS_API_KEY, .multi = "comma")

resp <- req_perform(req)
resp
resp_json_df <- resp |> resp_body_json(simplifyVector = TRUE)
head(resp_json_df)

     [,1]                                            [,2]         
[1,] "NAME"                                          "B01003_001E"
[2,] "Census Tract 1.01, Hennepin County, Minnesota" "3472"       
[3,] "Census Tract 1.02, Hennepin County, Minnesota" "4992"       
[4,] "Census Tract 3, Hennepin County, Minnesota"    "3404"       
[5,] "Census Tract 6.01, Hennepin County, Minnesota" "4706"       
[6,] "Census Tract 6.03, Hennepin County, Minnesota" "3301"       
     [,3]          [,4]    [,5]     [,6]    
[1,] "B19013_001E" "state" "county" "tract" 
[2,] "70927"       "27"    "053"    "000101"
[3,] "46333"       "27"    "053"    "000102"
[4,] "82098"       "27"    "053"    "000300"
[5,] "71122"       "27"    "053"    "000601"
[6,] "96875"       "27"    "053"    "000603"

resp_json_df <- janitor::row_to_names(resp_json_df, 1)
head(resp_json_df)

     NAME                                            B01003_001E B19013_001E
[1,] "Census Tract 1.01, Hennepin County, Minnesota" "3472"      "70927"    
[2,] "Census Tract 1.02, Hennepin County, Minnesota" "4992"      "46333"    
[3,] "Census Tract 3, Hennepin County, Minnesota"    "3404"      "82098"    
[4,] "Census Tract 6.01, Hennepin County, Minnesota" "4706"      "71122"    
[5,] "Census Tract 6.03, Hennepin County, Minnesota" "3301"      "96875"    
[6,] "Census Tract 11, Hennepin County, Minnesota"   "2004"      "69509"    
     state county tract   
[1,] "27"  "053"  "000101"
[2,] "27"  "053"  "000102"
[3,] "27"  "053"  "000300"
[4,] "27"  "053"  "000601"
[5,] "27"  "053"  "000603"
[6,] "27"  "053"  "001100"

hennepin_httr2 <- as_tibble(resp_json_df) |>
  mutate(population = parse_number(B01003_001E),
         median_income = parse_number(B19013_001E)) |>
  select(-B01003_001E, -B19013_001E, -state, -county)
  
hennepin_httr2 |>
  ggplot(aes(x = population, y = median_income)) + 
    geom_point() + 
    geom_smooth(method = "lm")

summary(hennepin_httr2$population)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0    2876    3714    3815    4651    9680

summary(hennepin_httr2$median_income)

      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
-666666666      61354      80966   -3966166     107232     250001

sort(hennepin_httr2$population)

  [1]    0  223 1514 1622 1672 1760 1766 1779 1798 1844 1848 1877 1897 1915 1926
 [16] 1935 1942 1973 2000 2004 2012 2013 2017 2038 2058 2061 2067 2092 2111 2123
 [31] 2130 2150 2163 2228 2235 2256 2272 2274 2280 2283 2295 2315 2339 2341 2357
 [46] 2399 2415 2416 2419 2460 2462 2476 2484 2499 2511 2511 2528 2532 2551 2570
 [61] 2594 2605 2625 2656 2658 2668 2670 2675 2681 2724 2738 2756 2763 2780 2796
 [76] 2808 2820 2822 2837 2848 2853 2865 2876 2878 2916 2935 2944 2950 2954 2969
 [91] 2971 2984 2994 3001 3036 3037 3038 3046 3047 3048 3075 3077 3119 3124 3127
[106] 3138 3150 3152 3162 3168 3193 3222 3224 3224 3225 3236 3251 3274 3298 3301
[121] 3305 3317 3317 3326 3331 3335 3341 3364 3372 3376 3379 3386 3404 3404 3418
[136] 3431 3439 3444 3454 3466 3472 3474 3486 3498 3512 3513 3557 3573 3574 3575
[151] 3585 3607 3628 3631 3634 3654 3656 3666 3671 3673 3676 3687 3703 3710 3714
[166] 3739 3750 3762 3764 3765 3799 3801 3805 3806 3808 3810 3811 3829 3832 3842
[181] 3853 3862 3877 3885 3890 3895 3896 3896 3903 3903 3913 3924 3930 3960 3967
[196] 3972 3974 3976 3978 3980 3989 3995 4008 4010 4013 4025 4036 4063 4086 4097
[211] 4098 4126 4132 4179 4200 4219 4228 4237 4273 4286 4295 4305 4319 4321 4326
[226] 4355 4359 4366 4371 4378 4385 4412 4441 4455 4460 4466 4472 4481 4503 4535
[241] 4584 4587 4591 4613 4622 4629 4651 4665 4671 4678 4693 4696 4706 4713 4718
[256] 4728 4747 4767 4769 4789 4789 4815 4855 4855 4874 4899 4919 4930 4972 4978
[271] 4983 4992 5030 5033 5041 5065 5085 5099 5107 5150 5195 5213 5244 5262 5267
[286] 5295 5305 5313 5364 5366 5385 5386 5415 5442 5459 5507 5510 5515 5541 5541
[301] 5587 5709 5725 5781 5821 5831 5872 5880 5980 6025 6069 6071 6102 6113 6166
[316] 6229 6249 6258 6265 6308 6482 6595 6709 6928 7286 7604 7828 9486 9680

sort(hennepin_httr2$median_income)

  [1] -666666666 -666666666      14748      20000      22768      23256
  [7]      23391      25708      31513      31981      32321      32758
 [13]      34273      35368      35855      36700      37315      37346
 [19]      37413      38286      38554      39420      39605      39609
 [25]      39630      40400      40476      40603      40867      42426
 [31]      42550      42753      43036      43750      44867      45640
 [37]      46157      46333      46596      47139      47197      47688
 [43]      47857      48464      48690      48750      49028      49139
 [49]      49659      50000      50741      50755      50935      51250
 [55]      51513      51705      51923      52169      52304      52370
 [61]      52781      52917      53393      53542      53564      53952
 [67]      54026      54636      55321      55430      55833      56338
 [73]      56955      57469      57802      57875      58426      59013
 [79]      59704      59876      60375      61213      61354      61547
 [85]      62188      62279      62404      62426      62770      63750
 [91]      63990      64250      64333      64621      64676      64792
 [97]      65323      65329      65395      65455      65590      65772
[103]      66364      66452      66549      66875      67102      67132
[109]      67473      67614      68114      68158      68369      68417
[115]      68434      68796      68913      68971      69509      69600
[121]      70089      70927      70970      71071      71122      71146
[127]      71250      71670      71818      72054      72102      72766
[133]      72853      73482      73514      73527      73897      73984
[139]      74286      74330      74817      75147      75556      75833
[145]      76111      76164      76417      76792      76839      77500
[151]      78137      78171      78333      78418      78509      78605
[157]      78728      79167      79191      79366      79750      80012
[163]      80080      80350      80966      81341      81341      81411
[169]      81977      82014      82098      82340      82527      83090
[175]      83250      83315      83380      84063      84569      84583
[181]      84792      85078      85221      85938      86106      86111
[187]      86904      87054      87390      87426      87599      87857
[193]      88431      88542      88895      89417      89740      89792
[199]      89891      89922      90167      91230      91250      91333
[205]      91637      91827      92019      92683      92941      93011
[211]      93750      94656      95750      95855      95980      96328
[217]      96378      96667      96856      96875      96983      97609
[223]      98137      98550      98986      99792      99853     100054
[229]     100329     100652     100761     101156     101194     101440
[235]     101578     103049     103531     103611     103750     104242
[241]     104306     104412     104795     104904     106310     106518
[247]     107232     107303     108476     108510     109722     110125
[253]     110339     110694     110729     110774     111364     111635
[259]     111950     112104     112557     112566     113563     113750
[265]     114550     115934     116281     116861     117631     118333
[271]     118594     118697     118828     119214     119821     120769
[277]     122180     122206     123312     125750     126250     127375
[283]     127396     130404     130486     131023     131042     132361
[289]     132604     133333     133472     133504     133859     134250
[295]     136012     136369     138848     141528     141984     142500
[301]     142889     143125     143744     143935     144282     144318
[307]     146328     147237     147672     148512     148611     149934
[313]     153917     154306     159857     161458     161471     165865
[319]     176580     178259     179743     179926     180463     185357
[325]     194417     194882     200438     202098     250001

hennepin_httr2 <- hennepin_httr2 |>
  mutate(median_income = ifelse(median_income > 0, median_income, NA),
         population = ifelse(population > 0, population, NA))
  
hennepin_httr2 |>
  ggplot(aes(x = population, y = median_income)) + 
    geom_point() + 
    geom_smooth(method = "lm")

Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_smooth()`).

Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).

# To make choropleth map by census tract, would need to download US Census
#   Bureau TIGER geometries using tigris package

# Finally using httr
url <- str_c("https://api.census.gov/data/2020/acs/acs5?get=NAME,B01003_001E,B19013_001E&for=tract:*&in=state:27&in=county:053", "&key=", CENSUS_API_KEY)
acs5 <- GET(url)
details <- content(acs5, "parsed")
# details 
details[[1]]  # variable names

[[1]]
[1] "NAME"

[[2]]
[1] "B01003_001E"

[[3]]
[1] "B19013_001E"

[[4]]
[1] "state"

[[5]]
[1] "county"

[[6]]
[1] "tract"

details[[2]]  # list with information on 1st tract

[[1]]
[1] "Census Tract 1.01, Hennepin County, Minnesota"

[[2]]
[1] "3472"

[[3]]
[1] "70927"

[[4]]
[1] "27"

[[5]]
[1] "053"

[[6]]
[1] "000101"

name = character()
population = double()
median_income = double()
tract = character()

for(i in 2:330) {
  name[i-1] <- details[[i]][[1]][1]
  population[i-1] <- details[[i]][[2]][1]
  median_income[i-1] <- details[[i]][[3]][1]
  tract[i-1] <- details[[i]][[6]][1]
}
hennepin_httr <- tibble(
  name = name,
  population = parse_number(population),
  median_income = parse_number(median_income),
  tract = tract
)

On Your Own

Write a for loop to obtain the Hennepin County data from 2017-2021
Write a function to give choices about year, county, and variables
Use your function from (2) along with map and list_rbind to build a data set for Rice county for the years 2019-2021

One more example using an API key

Here’s an example of getting data from a website that attempts to make imdb movie data available as an API.

Initial instructions:

go to omdbapi.com under the API Key tab and request a free API key
store your key as discussed earlier
explore the examples at omdbapi.com

We will first obtain data about the movie Coco from 2017.

myapikey <- Sys.getenv("OMDB_KEY")

# Find url exploring examples at omdbapi.com
url <- str_c("http://www.omdbapi.com/?t=Coco&y=2017&apikey=", myapikey)

coco <- GET(url)   # coco holds response from server
coco               # Status of 200 is good!

details <- content(coco, "parse")   
details                         # get a list of 25 pieces of information
details$Year                    # how to access details
details[[2]]                    # since a list, another way to access

Now build a data set for a collection of movies

# Must figure out pattern in URL for obtaining different movies
#  - try searching for others
movies <- c("Coco", "Wonder+Woman", "Get+Out", 
            "The+Greatest+Showman", "Thor:+Ragnarok")

# Set up empty tibble
omdb <- tibble(Title = character(), Rated = character(), Genre = character(),
       Actors = character(), Metascore = double(), imdbRating = double(),
       BoxOffice = double())

# Use for loop to run through API request process 5 times,
#   each time filling the next row in the tibble
#  - can do max of 1000 GETs per day
for(i in 1:5) {
  url <- str_c("http://www.omdbapi.com/?t=",movies[i],
               "&apikey=", myapikey)
  Sys.sleep(0.5)
  onemovie <- GET(url)
  details <- content(onemovie, "parse")
  omdb[i,1] <- details$Title
  omdb[i,2] <- details$Rated
  omdb[i,3] <- details$Genre
  omdb[i,4] <- details$Actors
  omdb[i,5] <- parse_number(details$Metascore)
  omdb[i,6] <- parse_number(details$imdbRating)
  omdb[i,7] <- parse_number(details$BoxOffice)   # no $ and ,'s
}

omdb

#  could use stringr functions to further organize this data - separate 
#    different genres, different actors, etc.

On Your Own (continued)

(Based on final project by Mary Wu and Jenna Graff, MSCS 264, Spring 2024). Start with a small data set on 56 national parks from kaggle, and supplement with columns for the park address (a single column including address, city, state, and zip code) and a list of available activities (a single character column with activities separated by commas) from the park websites themselves.

Preliminaries:

Request API here
Check out API guide

np_kaggle <- read_csv("~/264_fall_2024/Data/parks.csv")

Rows: 56 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): Park Code, Park Name, State
dbl (3): Acres, Latitude, Longitude

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.