Press s to see my lecture notes.
There are two underlying important pieces of information for spatial data:
Spatial data with a defined CRS can either be vector or raster data.
Though we refer to a shape file in the singular, it’s actually a collection of at least three basic files:
All files must be present in the directory and named the same (except for the file extension) to import correctly.
We’ll start by importing in a shape file of state boundaries from the Census.
st_read() is the function to import the shapefile.
Type out the code below or copy and paste it into the console or run simple_states
from the static_maps_slides.Rmd file
library(ggplot2)
library(sf)
fifty_location <- "data/cb_2017_us_state_20m/cb_2017_us_state_20m.shp"
fifty_states <- st_read(fifty_location)
View(fifty_states)
Map with ggplot2 functions and geom_sf
ggplot(fifty_states) + geom_sf()
Let’s pull in population data from CensusReporter.org
Using read_csv() from the readr package.
library(readr)
populations <- read_csv("data/acs2016_1yr_B02001_04000US55.csv")
View(populations)
ncol(fifty_states)
## [1] 10
library(dplyr)
fifty_states <- left_join(fifty_states, populations,
by=c("NAME"="name"))
ncol(fifty_states)
## [1] 31
There are a lot of variable names in this data frame. Check them out.
colnames(fifty_states)
## [1] "STATEFP" "STATENS" "AFFGEOID"
## [4] "GEOID" "STUSPS" "NAME"
## [7] "LSAD" "ALAND" "AWATER"
## [10] "geoid" "B02001001" "B02001001, Error"
## [13] "B02001002" "B02001002, Error" "B02001003"
## [16] "B02001003, Error" "B02001004" "B02001004, Error"
## [19] "B02001005" "B02001005, Error" "B02001006"
## [22] "B02001006, Error" "B02001007" "B02001007, Error"
## [25] "B02001008" "B02001008, Error" "B02001009"
## [28] "B02001009, Error" "B02001010" "B02001010, Error"
## [31] "geometry"
geom_sf()
forty_eight <- fifty_states %>%
filter(NAME!="Hawaii" & NAME!="Alaska" & NAME!="Puerto Rico")
ggplot(forty_eight) +
geom_sf(aes(fill=B02001001)) +
scale_fill_distiller(direction=1, name="Population") +
labs(title="Population of 48 states", caption="Source: US Census")
forty_eight <- fifty_states %>%
filter(NAME!="Hawaii" & NAME!="Alaska" & NAME!="Puerto Rico")
ggplot(forty_eight) +
geom_sf(aes(fill=B02001001)) +
scale_fill_distiller(direction=1, name="Population") +
labs(title="Population of 48 states", caption="Source: US Census")
forty_eight <- fifty_states %>%
filter(NAME!="Hawaii" & NAME!="Alaska" & NAME!="Puerto Rico")
ggplot(forty_eight) +
geom_sf(aes(fill=B02001001)) +
scale_fill_distiller(direction=1, name="Population") +
labs(title="Population of 48 states", caption="Source: US Census")
Using the tigris package, which lets us download Census shapefiles directly into R without having to unzip and point to directories, etc.
Simply call any of these functions (with the proper location variables):
tracts()
counties()
school_districts()
roads()
First, let’s make sure the shape files download as sf files (because it can also handle sp versions, as well)
If cb is set to TRUE, it downloads a generalized (1:500k) counties file. Default is FALSE (the most detailed TIGER file).
library(tigris)
options(tigris_use_cache = TRUE)
options(tigris_class = "sf")
tx <- counties("TX", cb=T)
View(tx)
Looks familiar, right?
fifty_location <- "data/cb_2017_us_state_20m/cb_2017_us_state_20m.shp"
fifty_states <- st_read(fifty_location)
View(fifty_states)
ggplot(tx) +
geom_sf() +
theme_void() +
theme(panel.grid.major = element_line(colour = 'transparent')) +
labs(title="Texas counties")
theme_void() +
theme(panel.grid.major = element_line(colour = 'transparent')) +
Time to add some data
Instead of downloading data from the horrible-to-navigate Census FactFinder or pleasant-to-navigate CensusReporter.org we can pull the code with the censusapi package from Hannah Recht, of Bloomberg.
First, sign up for a census key and load it into the environment.
Replace YOURKEYHERE with your Census API key.
# Add key to .Renviron
Sys.setenv(CENSUS_KEY="YOURKEYHERE")
# Reload .Renviron
readRenviron("~/.Renviron")
# Check to see that the expected key is output in your R console
Sys.getenv("CENSUS_KEY")
library(censusapi)
Check out the dozens of data sets you have access to now.
apis <- listCensusApis()
View(apis)
We’ll focus on using the getCensus()
function from the package. It makes an API call and returns a data frame of results.
These are the arguments you’ll need to pass it:
name
- the name of the Census data set, like “acs5” or “timeseries/bds/firms”vintage
- the year of the data setvars
- one or more variables to access (remember B02001001 from above?)region
- the geography level of data, like county or tracts or statePlease don’t run this right now.
You can use listCensusMetadata()
to see what tables might be available from the ACS Census survey.
acs_vars <- listCensusMetadata(name="acs/acs5", type="variables", vintage=2016)
View(acs_vars)
tx_income <- getCensus(name = "acs/acs5", vintage = 2016,
vars = c("NAME", "B19013_001E", "B19013_001M"),
region = "county:*", regionin = "state:48")
head(tx_income)
## state county NAME B19013_001E B19013_001M
## 1 48 001 Anderson County, Texas 42146 2539
## 2 48 003 Andrews County, Texas 70121 7053
## 3 48 005 Angelina County, Texas 44185 2107
## 4 48 007 Aransas County, Texas 44851 4261
## 5 48 009 Archer County, Texas 62407 5368
## 6 48 011 Armstrong County, Texas 65000 9415
tx4ever <- left_join(tx, tx_income, by=c("COUNTYFP"="county"))
ggplot(tx4ever) +
geom_sf(aes(fill=B19013_001E), color="white") +
theme_void() +
theme(panel.grid.major = element_line(colour = 'transparent')) +
scale_fill_distiller(palette="Oranges", direction=1, name="Median income") +
labs(title="2016 Median income in Texas counties", caption="Source: US Census/ACS5 2016")
Newest package for Census data: tidycensus
With tidycensus, you can download the shape files with the data you want already attached. No joins necessary.
Let’s get right into mapping. We’ll calculate unemployment percents by Census tract in Jersey City. It’ll involve wrangling some data. But querying the data with get_acs()
will be easy and so will getting the shape file by simply passing it geometry=T
.
library(tidycensus)
Pass it your Census key.
census_api_key("YOUR API KEY GOES HERE")
jobs <- c(labor_force = "B23025_005E",
unemployed = "B23025_002E")
jersey <- get_acs(geography="tract", year=2016,
variables= jobs, county = "Hudson",
state="NJ", geometry=T)
head(jersey)
## Simple feature collection with 6 features and 5 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -74.05794 ymin: 40.74748 xmax: -74.03778 ymax: 40.76072
## epsg (SRID): 4269
## proj4string: +proj=longlat +datum=NAD83 +no_defs
## GEOID NAME variable
## 1 34017000100 Census Tract 1, Hudson County, New Jersey B23025_002
## 2 34017000100 Census Tract 1, Hudson County, New Jersey B23025_005
## 3 34017000200 Census Tract 2, Hudson County, New Jersey B23025_002
## 4 34017000200 Census Tract 2, Hudson County, New Jersey B23025_005
## 5 34017000300 Census Tract 3, Hudson County, New Jersey B23025_002
## 6 34017000300 Census Tract 3, Hudson County, New Jersey B23025_005
## estimate moe geometry
## 1 3488 572 MULTIPOLYGON (((-74.05794 4...
## 2 328 147 MULTIPOLYGON (((-74.05794 4...
## 3 2774 409 MULTIPOLYGON (((-74.05223 4...
## 4 85 69 MULTIPOLYGON (((-74.05223 4...
## 5 2397 305 MULTIPOLYGON (((-74.04682 4...
## 6 158 65 MULTIPOLYGON (((-74.04682 4...
library(tidyr)
jersey %>%
mutate(variable=case_when(
variable=="B23025_005" ~ "Unemployed",
variable=="B23025_002" ~ "Workforce")) %>%
select(-moe) %>%
spread(variable, estimate) %>%
mutate(percent_unemployed=round(Unemployed/Workforce*100,2)) %>%
ggplot(aes(fill=percent_unemployed)) +
geom_sf(color="white") +
theme_void() +
theme(panel.grid.major = element_line(colour = 'transparent')) +
scale_fill_distiller(palette="Reds", direction=1, name="Estimate") +
labs(title="Percent unemployed in Jersey City",
caption="Source: US Census/ACS5 2016")
racevars <- c(White = "B02001_002",
Black = "B02001_003",
Asian = "B02001_005",
Hispanic = "B03003_003")
harris <- get_acs(geography = "tract", variables = racevars,
state = "TX", county = "Harris County", geometry = TRUE,
summary_var = "B02001_001", year=2017)
head(harris)
head(harris)
## Simple feature collection with 6 features and 7 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -95.37457 ymin: 29.74486 xmax: -95.34116 ymax: 29.77055
## epsg (SRID): 4269
## proj4string: +proj=longlat +datum=NAD83 +no_defs
## GEOID NAME variable estimate
## 1 48201100000 Census Tract 1000, Harris County, Texas White 3339
## 2 48201100000 Census Tract 1000, Harris County, Texas Black 780
## 3 48201100000 Census Tract 1000, Harris County, Texas Asian 231
## 4 48201100000 Census Tract 1000, Harris County, Texas Hispanic 995
## 5 48201210100 Census Tract 2101, Harris County, Texas White 2832
## 6 48201210100 Census Tract 2101, Harris County, Texas Black 3029
## moe summary_est summary_moe geometry
## 1 392 4712 480 MULTIPOLYGON (((-95.37457 2...
## 2 268 4712 480 MULTIPOLYGON (((-95.37457 2...
## 3 132 4712 480 MULTIPOLYGON (((-95.37457 2...
## 4 281 4712 480 MULTIPOLYGON (((-95.37457 2...
## 5 587 6252 1029 MULTIPOLYGON (((-95.35934 2...
## 6 574 6252 1029 MULTIPOLYGON (((-95.35934 2...
library(viridis)
harris %>%
mutate(pct = 100 * (estimate / summary_est)) %>%
ggplot(aes(fill = pct, color = pct)) +
facet_wrap(~variable) +
geom_sf() +
coord_sf(crs = 26915) +
scale_fill_viridis(direction=-1) +
scale_color_viridis(direction=-1) +
theme_void() +
theme(panel.grid.major = element_line(colour = 'transparent')) +
labs(title="Racial geography of Harris County, Texas", caption="Source: US Census 2010")
If you pass shift_geo=T
to the get_acs()
function in tidycensus then the states will be re positioned.
county_pov <- get_acs(geography = "county",
variables = "B17001_002",
summary_var = "B17001_001",
geometry = TRUE,
shift_geo = TRUE) %>%
mutate(pctpov = 100 * (estimate/summary_est))
ggplot(county_pov) +
geom_sf(aes(fill = pctpov), color=NA) +
coord_sf(datum=NA) +
labs(title = "Percent of population in poverty by county",
subtitle = "Alaska and Hawaii are shifted and not to scale",
caption = "Source: ACS 5-year, 2016",
fill = "% in poverty") +
scale_fill_viridis(direction=-1)
The Leaflet R package was created by the folks behind RStudio to integrate with the popular opensource JavaScript library.
Essentially, this package lets you make maps with custom map tiles, markers, polygons, lines, popups, and geojson.
Almost any maps you can make in Google Fusion Tables or Carto(DB), you can make in R using the Leaflet package.
Start with the tx
shapefile we downloaded via API
leaflet()
- initializes the leaflet functionaddTiles()
- the underlying map tilesaddPolygons()
- instead of dots, we’re adding Polygons, or shapes
popup
to the function with the variable NAME from the shape filelibrary(leaflet)
tx %>%
leaflet() %>%
addTiles() %>%
addPolygons(popup=~NAME)
# Creating a color palette based on the number range in the B19013_001E column
pal <- colorNumeric("Reds", domain=tx4ever$B19013_001E)
# Setting up the pop up text
popup_sb <- paste0("Median income in ", tx4ever$NAME.x, "\n$", as.character(tx4ever$B19013_001E))
addProviderTiles()
- instead of addTiles()
setView()
- sets the starting position of the map
addPolygons()
fillColor()
fillOpacity()
weight
smoothFactor()
addLegend()
- same as in the previous section but with more customization# Mapping it with the new tiles CartoDB.Positron
leaflet() %>%
addProviderTiles("CartoDB.Positron") %>%
setView(-98.807691, 31.45037, zoom = 6) %>%
addPolygons(data = tx4ever ,
fillColor = ~pal(tx4ever$B19013_001E),
fillOpacity = 0.7,
weight = 0.2,
smoothFactor = 0.2,
popup = ~popup_sb) %>%
addLegend(pal = pal,
values = tx4ever$B19013_001E,
position = "bottomright",
title = "Median income")