# Initial packages required (we'll be adding more)
library(tidyverse)
library(mdsr) # package associated with our MDSR book
Creating informative maps
You can download this .qmd file from here. Just hit the Download Raw File button.
Opening example
Here is a simple choropleth map example from Section 3.2.3 of MDSR. Note how we use an underlying map with strategic shading to convey a story about a variable that’s been measured on each country.
# CIACountries is a 236 x 8 data set with information on each country
# taken from the CIA factbook - gdp, education, internet use, etc.
head(CIACountries)
|>
CIACountries select(country, oil_prod) |>
mutate(oil_prod_disc = cut(oil_prod,
breaks = c(0, 1e3, 1e5, 1e6, 1e7, 1e8),
labels = c(">1000", ">10,000", ">100,000", ">1 million",
">10 million"))) |>
1::mWorldMap(key = "country") +
mosaicgeom_polygon(aes(fill = oil_prod_disc)) +
scale_fill_brewer("Oil Prod. (bbl/day)", na.value = "white") +
theme(legend.position = "top")
- 1
- We won’t use mWorldMap often, but it’s a good quick illustration
country pop area oil_prod gdp educ roadways net_users
1 Afghanistan 32564342 652230 0 1900 NA 0.06462444 >5%
2 Albania 3029278 28748 20510 11900 3.3 0.62613051 >35%
3 Algeria 39542166 2381741 1420000 14500 4.3 0.04771929 >15%
4 American Samoa 54343 199 0 13000 NA 1.21105528 <NA>
5 Andorra 85580 468 NA 37200 NA 0.68376068 >60%
6 Angola 19625353 1246700 1742000 7300 3.5 0.04125211 >15%
Choropleth Maps
When you have specific regions (e.g. countries, states, counties, census tracts,…) and a value associated with each region.
A choropleth map will color the entire region according to the value. For example, let’s consider state vaccination data from March 2021.
<- read_csv("https://proback.github.io/264_fall_2024/Data/vacc_Mar21.csv")
vaccines
<- vaccines |>
vacc_mar13 filter(Date =="2021-03-13") |>
select(State, Date, people_vaccinated_per100, share_doses_used, Governor)
vacc_mar13
# A tibble: 50 × 5
State Date people_vaccinated_per100 share_doses_used Governor
<chr> <date> <dbl> <dbl> <chr>
1 Alabama 2021-03-13 17.2 0.671 R
2 Alaska 2021-03-13 27.0 0.686 R
3 Arizona 2021-03-13 21.5 0.821 R
4 Arkansas 2021-03-13 19.2 0.705 R
5 California 2021-03-13 20.3 0.726 D
6 Colorado 2021-03-13 20.8 0.801 D
7 Connecticut 2021-03-13 26.2 0.851 D
8 Delaware 2021-03-13 20.2 0.753 D
9 Florida 2021-03-13 20.1 0.766 R
10 Georgia 2021-03-13 15.2 0.674 R
# ℹ 40 more rows
The tricky part of choropleth maps is getting the shapes (polygons) that make up the regions. This is really a pretty complex set of lines for R to draw!
Luckily, some maps are already created in R in the maps
package.
library(maps)
<- map_data("state")
us_states head(us_states)
long lat group order region subregion
1 -87.46201 30.38968 1 1 alabama <NA>
2 -87.48493 30.37249 1 2 alabama <NA>
3 -87.52503 30.37249 1 3 alabama <NA>
4 -87.53076 30.33239 1 4 alabama <NA>
5 -87.57087 30.32665 1 5 alabama <NA>
6 -87.58806 30.32665 1 6 alabama <NA>
|>
us_states ggplot(mapping = aes(x = long, y = lat,
group = group)) +
geom_polygon(fill = "white", color = "black")
[Pause to ponder:] What might the group
and order
columns represent?
Other maps provided by the maps
package include US counties, France, Italy, New Zealand, and two different views of the world. If you want maps of other countries or regions, you can often find them online.
Where the really cool stuff happens is when we join our data to the us_states
dataframe. Notice that the state name appears in the “region” column of us_states
, and that the state name is in all small letters. In vacc_mar13
, the state name appears in the State column and is in title case. Thus, we have to be very careful when we join the state vaccine info to the state geography data.
Run this line by line to see what it does:
<- vacc_mar13 |>
vacc_mar13 mutate(State = str_to_lower(State))
|>
vacc_mar13 right_join(us_states, by = c("State" = "region")) |>
rename(region = State) |>
ggplot(mapping = aes(x = long, y = lat,
group = group)) +
geom_polygon(aes(fill = people_vaccinated_per100), color = "black")
oops, New York appears to be a problem.
|>
vacc_mar13 anti_join(us_states, by = c("State" = "region"))
# A tibble: 3 × 5
State Date people_vaccinated_per100 share_doses_used Governor
<chr> <date> <dbl> <dbl> <chr>
1 alaska 2021-03-13 27.0 0.686 R
2 hawaii 2021-03-13 22.8 0.759 D
3 new york state 2021-03-13 21.7 0.764 D
|>
us_states anti_join(vacc_mar13, by = c("region" = "State")) |>
count(region)
region n
1 district of columbia 10
2 new york 495
[Pause to ponder:] What did we learn by running anti_join()
above?
Notice that the us_states
map also includes only the contiguous 48 states. This gives an example of creating really beautiful map insets for Alaska and Hawaii.
<- vacc_mar13 |>
vacc_mar13 mutate(State = str_replace(State, " state", ""))
|>
vacc_mar13 anti_join(us_states, by = c("State" = "region"))
# A tibble: 2 × 5
State Date people_vaccinated_per100 share_doses_used Governor
<chr> <date> <dbl> <dbl> <chr>
1 alaska 2021-03-13 27.0 0.686 R
2 hawaii 2021-03-13 22.8 0.759 D
|>
us_states anti_join(vacc_mar13, by = c("region" = "State")) %>%
count(region)
region n
1 district of columbia 10
Better.
library(viridis) # for color schemes
|>
vacc_mar13 right_join(us_states, by = c("State" = "region")) |>
rename(region = State) |>
ggplot(mapping = aes(x = long, y = lat,
group = group)) +
geom_polygon(aes(fill = people_vaccinated_per100), color = "black") +
labs(fill = "People Vaccinated\nper 100 pop.") +
1coord_map() +
2theme_void() +
3scale_fill_viridis()
- 1
- This scales the longitude and latitude so that the shapes look correct. coord_quickmap() can also work here - it’s less exact but faster.
- 2
- This theme can give you a really clean look
- 3
- You can change the fill scale for different color schemes.
You can also use a categorical variable to color regions:
|>
vacc_mar13 right_join(us_states, by = c("State" = "region")) |>
rename(region = State) |>
ggplot(mapping = aes(x = long, y = lat,
group = group)) +
geom_polygon(aes(fill = Governor), color = "darkgrey", linewidth = 0.2) +
labs(fill = "Governor") +
coord_map() +
theme_void() +
1scale_fill_manual(values = c("blue", "red"))
- 1
- You can change the fill scale for different color schemes.
Note: Map projections are actually pretty complicated, especially if you’re looking at large areas (e.g. world maps) or drilling down to very small regions where a few feet can make a difference (e.g. tracking a car on a map of roads). It’s impossible to preserve both shape and area when projecting an (imperfect) sphere onto a flat surface, so that’s why you sometimes see such different maps of the world. This is why packages like maps
which connect latitude-longitude points are being phased out in favor of packages like sf
with more GIS functionality. We won’t get too deep into GIS in this class, but to learn more, take Spatial Data Analysis!!
Multiple maps!
You can still use data viz tools from Data Science 1 (like facetting) to create things like time trends in maps:
library(lubridate)
<- vaccines |>
weekly_vacc mutate(State = str_to_lower(State)) |>
mutate(State = str_replace(State, " state", ""),
week = week(Date)) |>
group_by(week, State) |>
summarize(date = first(Date),
mean_daily_vacc = mean(daily_vaccinated/est_population*1000)) |>
right_join(us_states, by =c("State" = "region")) |>
rename(region = State)
|>
weekly_vacc filter(week > 2, week < 11) |>
ggplot(mapping = aes(x = long, y = lat,
group = group)) +
geom_polygon(aes(fill = mean_daily_vacc), color = "darkgrey",
linewidth = 0.1) +
labs(fill = "Weekly Average Daily Vaccinations per 1000") +
coord_map() +
theme_void() +
scale_fill_viridis() +
facet_wrap(~date) +
theme(legend.position = "bottom")
[Pause to ponder:] are we bothered by the warning about many-to-many when you run the code above?
Other cool state maps
statebin (square representation of states)
library(statebins) # may need to install
|>
vacc_mar13 mutate(State = str_to_title(State)) |>
statebins(state_col = "State",
value_col = "people_vaccinated_per100") +
1theme_statebins() +
labs(fill = "People Vaccinated per 100")
- 1
- One nice layout. You can customize with usual ggplot themes.
[Pause to ponder:] Why might one use a map like above instead of our previous choropleth maps?
I used this example to create the code above. The original graph is located here.
Interactive point maps with leaflet
To add even more power and value to your plots, we can add interactivity. For now, we will use the leaflet
package, but later in the course we will learn even more powerful and flexible approaches for creating interactive plots and webpages.
For instance, here is a really simple plot with a pop-up window:
library(leaflet)
leaflet() |>
1addTiles() |>
2setView(-93.1832, 44.4597, zoom = 17) |>
3addPopups(-93.1832, 44.4597, 'Here is the <b>Regents Hall of Mathematical Sciences</b>, home of the Statistics and Data Science program at St. Olaf College')
- 1
- addTiles() uses OpenStreetMap, an awesome open-source mapping resource, as the default tile layer (background map)
- 2
- setView() centers the map at a specific latitude and longitude, then zoom controls how much of the surrounding area is shown
- 3
- add a popup message (with html formatting) that can be clicked on or off
Leaflet is not part of the tidyverse, but the structure of its code is pretty similar and it also plays well with piping.
Let’s try pop-up messages with a data set containing Airbnb listings in the Boston area:
leaflet() |>
addTiles() |>
setView(lng = mean(airbnb.df$Long), lat = mean(airbnb.df$Lat),
zoom = 13) |>
addCircleMarkers(data = airbnb.df,
lat = ~ Lat,
lng = ~ Long,
popup = ~ AboutListing,
radius = ~ S_Accomodates,
# These last options describe how the circles look
weight = 2,
color = "red",
fillColor = "yellow")
[Pause to ponder:] List similarities and differences between leaflet plots and ggplots.
Interactive choropleth maps with leaflet
OK. Now let’s see if we can put things together and duplicate the interactive choropleth map found here showing population density by state in the US.
A preview to shapefiles and the sf package
1library(sf)
2<- read_sf("https://rstudio.github.io/leaflet/json/us-states.geojson")
states 3class(states)
states
- 1
-
sf
stands for “simple features” - 2
- From https://leafletjs.com/examples/choropleth/us-states.js
- 3
-
Note that
states
has classsf
in addition to the usualtbl
anddf
[1] "sf" "tbl_df" "tbl" "data.frame"
Simple feature collection with 52 features and 3 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -188.9049 ymin: 17.92956 xmax: -65.6268 ymax: 71.35163
Geodetic CRS: WGS 84
# A tibble: 52 × 4
id name density geometry
<chr> <chr> <dbl> <MULTIPOLYGON [°]>
1 01 Alabama 94.6 (((-87.3593 35.00118, -85.60667 34.98475…
2 02 Alaska 1.26 (((-131.602 55.11798, -131.5692 55.28229…
3 04 Arizona 57.0 (((-109.0425 37.00026, -109.048 31.33163…
4 05 Arkansas 56.4 (((-94.47384 36.50186, -90.15254 36.4963…
5 06 California 242. (((-123.2333 42.00619, -122.3789 42.0116…
6 08 Colorado 49.3 (((-107.9197 41.00391, -105.729 40.99843…
7 09 Connecticut 739. (((-73.05353 42.03905, -71.79931 42.0226…
8 10 Delaware 464. (((-75.41409 39.80446, -75.5072 39.68396…
9 11 District of Columbia 10065 (((-77.03526 38.99387, -76.90929 38.8952…
10 12 Florida 353. (((-85.49714 30.99754, -85.00421 31.0030…
# ℹ 42 more rows
For maps in leaflet
that show boundaries and not just points, we need to input a shapefile rather than a series of latitude-longitude combinations like we did for the maps
package. In the example we’re emulating, they use the read_sf()
function from the sf
package to read in data. While our us_states
data frame from the maps
package contained 15537 rows, our simple features object states
contains only 52 rows - one per state. Importantly, states
contains a column called geometry
, which is a “multipolygon” with all the information necessary to draw a specific state. Also, while states
can be treated as a tibble or data frame, it is also an sf
class object with a specific “geodetic coordinate reference system”. Again, take Spatial Data Analysis for more on shapefiles and simple features!
Note also that the authors of this example have already merged state population densities with state geometries, but if we wanted to merge in other state characteristics using the name
column as a key, we could definitely do this!
First we’ll start with a static plot using a simple features object and geom_sf():
# Create density bins as on the webpage
<- states |>
state_plotting_sf mutate(density_intervals = cut(density, n = 8,
breaks = c(0, 10, 20, 50, 100, 200, 500, 1000, Inf))) |>
filter(!(name %in% c("Alaska", "Hawaii", "Puerto Rico")))
ggplot(data = state_plotting_sf) +
geom_sf(aes(fill = density_intervals), colour = "white", linetype = 2) +
# geom_sf_label(aes(label = density)) + # labels too busy here
theme_void() +
scale_fill_brewer(palette = "YlOrRd")
Now let’s use leaflet
to create an interactive plot!
# Create our own category bins for population densities
# and assign the yellow-orange-red color palette
<- c(0, 10, 20, 50, 100, 200, 500, 1000, Inf)
bins <- colorBin("YlOrRd", domain = states$density, bins = bins)
pal
# Create labels that pop up when we hover over a state. The labels must
# be part of a list where each entry is tagged as HTML code.
library(htmltools)
library(glue)
<- states |>
states mutate(labels = str_c(name, ": ", density, " people / sq mile"))
# If want more HTML formatting, use these lines instead of those above:
#states <- states |>
# mutate(labels = glue("<strong>{name}</strong><br/>{density} people / #mi<sup>2</sup>"))
<- lapply(states$labels, HTML)
labels
leaflet(states) %>%
setView(-96, 37.8, 4) %>%
addTiles() %>%
addPolygons(
fillColor = ~pal(density),
weight = 2,
opacity = 1,
color = "white",
dashArray = "3",
fillOpacity = 0.7,
highlightOptions = highlightOptions(
weight = 5,
color = "#666",
dashArray = "",
fillOpacity = 0.7,
bringToFront = TRUE),
label = labels,
labelOptions = labelOptions(
style = list("font-weight" = "normal", padding = "3px 8px"),
textsize = "15px",
direction = "auto")) %>%
addLegend(pal = pal, values = ~density, opacity = 0.7, title = NULL,
position = "bottomright")
[Pause to ponder:] Pick several formatting options in the code above, determine what they do, and then change them to create a customized look.
On Your Own
The states
dataset in the poliscidata
package contains 135 variables on each of the 50 US states. See here for more detail.
Your task is to create a two meaningful choropleth plots, one using a numeric variable and one using a categorical variable from poliscidata::states
. You should make two versions of each plot: a static plot using the maps
package and ggplot()
, and an interactive plot using the sf
package and leaflet()
. Write a sentence or two describing what you can learn from each plot.
Here’s some R code and hints to get you going:
# Get info to draw US states for geom_polygon (connect the lat-long points)
library(maps)
<- as_tibble(map_data("state")) |>
states_polygon select(region, group, order, lat, long)
# See what the state (region) levels look like in states_polygon
unique(states_polygon$region)
[1] "alabama" "arizona" "arkansas"
[4] "california" "colorado" "connecticut"
[7] "delaware" "district of columbia" "florida"
[10] "georgia" "idaho" "illinois"
[13] "indiana" "iowa" "kansas"
[16] "kentucky" "louisiana" "maine"
[19] "maryland" "massachusetts" "michigan"
[22] "minnesota" "mississippi" "missouri"
[25] "montana" "nebraska" "nevada"
[28] "new hampshire" "new jersey" "new mexico"
[31] "new york" "north carolina" "north dakota"
[34] "ohio" "oklahoma" "oregon"
[37] "pennsylvania" "rhode island" "south carolina"
[40] "south dakota" "tennessee" "texas"
[43] "utah" "vermont" "virginia"
[46] "washington" "west virginia" "wisconsin"
[49] "wyoming"
# Get info to draw US states for geom_sf and leaflet (simple features object
# with multipolygon geometry column)
library(sf)
<- read_sf("https://rstudio.github.io/leaflet/json/us-states.geojson") |>
states_sf select(name, geometry)
# See what the state (name) levels look like in states_sf
unique(states_sf$name)
[1] "Alabama" "Alaska" "Arizona"
[4] "Arkansas" "California" "Colorado"
[7] "Connecticut" "Delaware" "District of Columbia"
[10] "Florida" "Georgia" "Hawaii"
[13] "Idaho" "Illinois" "Indiana"
[16] "Iowa" "Kansas" "Kentucky"
[19] "Louisiana" "Maine" "Maryland"
[22] "Massachusetts" "Michigan" "Minnesota"
[25] "Mississippi" "Missouri" "Montana"
[28] "Nebraska" "Nevada" "New Hampshire"
[31] "New Jersey" "New Mexico" "New York"
[34] "North Carolina" "North Dakota" "Ohio"
[37] "Oklahoma" "Oregon" "Pennsylvania"
[40] "Rhode Island" "South Carolina" "South Dakota"
[43] "Tennessee" "Texas" "Utah"
[46] "Vermont" "Virginia" "Washington"
[49] "West Virginia" "Wisconsin" "Wyoming"
[52] "Puerto Rico"
# Load in state-wise data for filling our choropleth maps
# (Note that I selected my two variables of interest to simplify)
library(poliscidata) # may have to install first
<- as_tibble(poliscidata::states) |>
polisci_data select(state, carfatal07, cook_index3)
# See what the state (state) levels look like in polisci_data
unique(polisci_data$state) # can't see trailing spaces but can see
[1] Alaska
[2] Alabama
[3] Arkansas
[4] Arizona
[5] California
[6] Colorado
[7] Connecticut
[8] Delaware
[9] Florida
[10] Georgia
[11] Hawaii
[12] Iowa
[13] Idaho
[14] Illinois
[15] Indiana
[16] Kansas
[17] Kentucky
[18] Louisiana
[19] Massachusetts
[20] Maryland
[21] Maine
[22] Michigan
[23] Minnesota
[24] Missouri
[25] Mississippi
[26] Montana
[27] NorthCarolina
[28] NorthDakota
[29] Nebraska
[30] NewHampshire
[31] NewJersey
[32] NewMexico
[33] Nevada
[34] NewYork
[35] Ohio
[36] Oklahoma
[37] Oregon
[38] Pennsylvania
[39] RhodeIsland
[40] SouthCarolina
[41] SouthDakota
[42] Tennessee
[43] Texas
[44] Utah
[45] Virginia
[46] Vermont
[47] Washington
[48] Wisconsin
[49] WestVirginia
[50] Wyoming
50 Levels: Alabama ...
# lack of internal spaces
print(polisci_data) # can see trailing spaces
# A tibble: 50 × 3
state carfatal07 cook_index3
<fct> <dbl> <fct>
1 "Alaska " 15.2 More Rep
2 "Alabama " 25.9 More Rep
3 "Arkansas " 23.7 More Rep
4 "Arizona " 17.6 Even
5 "California " 11.7 More Dem
6 "Colorado " 12.3 Even
7 "Connecticut " 8.7 More Dem
8 "Delaware " 13.6 More Dem
9 "Florida " 18.1 Even
10 "Georgia " 18.5 Even
# ℹ 40 more rows
R code hints:
- stringr functions like
str_squish
andstr_to_lower
andstr_replace_all
(be sure to carefully look at your keys!) - *_join functions (make sure they preserve classes)
- filter so that you only have 48 contiguous states (and maybe DC)
- for help with colors: https://rstudio.github.io/leaflet/reference/colorNumeric.html
- be sure labels pop up when scrolling with leaflet
# Make sure all keys have the same format before joining:
# all lower case, no internal or external spaces
# Now we can merge data sets together for the static and the interactive plots
# Merge with states_polygon (static)
# Check that merge worked for 48 contiguous states
# Merge with states_sf (static or interactive)
# Check that merge worked for 48 contiguous states
Numeric variable (static plot):
Numeric variable (interactive plot):
# it's okay to skip a legend here
Categorical variable (static plot):
# be really careful with matching color order to factor level order
Categorical variable (interactive plot):
# may use colorFactor() here