This tutorial is an introduction to analyzing spatial data in R, specifically through making interactive locator and choropleth maps using the Leaflet package. You’ll be introduced to the basics of using R as a fast and powerful command-line Geographical Information System (GIS).

No prior knowledge is necessary to follow along, but an understanding of how pipes (%>%) and dplyr works will help.

Other mapping tutorials

Programatically generating maps and GIFs with shape files [link]
Mapping with Census data [link]

Getting started with R Leaflet

Sometimes it’s necessary to zoom in or pan around a map for greater comprehension while exploring data spatially.

The Leaflet R package was created by the folks behind RStudio to integrate with the popular opensource Javascript library.

It’s great for journalists who have little knowledge of Javascript who want to make interesting interactives using R. And there is excellent documentation if you want to dig deeper into its functionality after this introduction.

Essentially, this package lets you make maps with custom map tiles, markers, polygons, lines, popups, and geojson. Almost any maps you can make in Google Fusion Tables or Carto(DB), you can make in R using the Leaflet package.

# Uncomment and run "install.packages" functions below if you have not yet installed these packages

#install.packages("leaflet")
library(leaflet)

#install.packages("tidyverse")
library(tidyverse)

Putting a marker on a map

Create a map widget by calling the leaflet() function
Add layers (such as features) to the map by using layer functions
- like addTiles, addMarkers, addPolygons
Print the map widget

# Initialize and assign m as the leaflet object
m <- leaflet()
# Now add tiles to it
m <- addTiles(m)
# Now, add a marker with a popup
m <- addMarkers(m, lng=-81.655210, lat=30.324303, popup="<b>Hello</b><br><a href='http://ire.org/conferences/nicar2017/'>-NICAR 2017</a>")

# Print out the map
m

Simple.

But we’re going to add another function to customize the viewport zoom and center location (setView).

And we’re going to switch to using the magrittr pipe operator (%>%) from now on to simplify our code.

# It’s easier to wrap your head around it if you think of coding grammatically. 

# Normal coding in R is rigid declarative sentences: “Bob is 32. Nancy is 4 years younger than Bob.” 

# Coding with the pipe operator: “Nancy is 4 years younger than Bob, who is 32.” 

# Pipes are a comma (or a semi-colon, if you want) that lets you create one long, run-on sentence.

This is the same code as above but using pipes

m <- leaflet() %>%
  addTiles() %>%  
  setView(-81.655210, 30.324303, zoom = 16) %>%
  addMarkers(lng=-81.655210, lat=30.324303, popup="<b>Hello</b><br><a href='http://ire.org/conferences/nicar2017/'>-NICAR 2017</a>")

# See how the m object is no longer needed to be included in every line except the first? It's just assumed now.

m

Explaining the R code

leaflet() initializes the leaflet workspace
addTiles() by itself will bring in the default OpenStreetMap tiles
- Here’s a list of free leaflet tiles you can use
- Note: OpenStreetMaps is a wonderful and free open-source service. Their only stipulation for using their tiles is to be sure to credit and link to them in the map.
setView() is pretty self-explanatory but is simpler to implement
addMarkers() with some specific parameters.

Note: The order of commands is important. A view can’t be set unless there are tiles established first.

How to put the map online

Run the code in your RStudio console and it will appear in your Viewer tab.

Click on Export > Save as Web page.

Export

Give it a filename and click save.

Save as

You have the map now as a full screen html file.

File

You can upload the file wherever you like and then iframe to it if you want to embed it into website like the code below.

<iframe src="nicarmap.html" frameborder="0" width="100%" height="300px"></iframe>

Would produce this:

Or you can leave it embedded in an RMarkdown file as the raw R code.

A note about file sizes

When comparing the size of the HTML files, the R-produced version of the map is larger in size because it is bringing all the Javascript and CSS inline into the HTML.

However, when looking at how much data is actually downloaded to load the map html, the differences aren’t as drastic.

Comparison

It’s just something to keep in mind.

Multiple locations from a csv

Let’s bring in some new data.

dunkin <- read.csv("data/dunkin.csv", stringsAsFactors=F)

# Bringing in the DataTables package to display the data in a nice format
library(DT)

# Using the datatable function from the DT package to see the first 6 rows of data
datatable(head(dunkin))

We’ve imported nearly 8,000 rows of Dunkin’ Donuts store location data.

Let’s make a map with a new tile set.

And this time, we’ll use addCircles instead of addMarkers.

Some options to use with addCircles includes the data to pull in for popup and color, which we’ve made bright orange. We’ve also set radius and weight and fillOpacity.

If we wanted to change the radius of the circle based on some datapoint, you could replace 40 with some column with numeric values in it.

m <- leaflet(dunkin) %>% addTiles('http://{s}.basemaps.cartocdn.com/dark_all/{z}/{x}/{y}.png', 
    attribution='Map tiles by <a href="http://stamen.com">Stamen Design</a>, <a href="http://creativecommons.org/licenses/by/3.0">CC BY 3.0</a> &mdash; Map data &copy; <a href="http://www.openstreetmap.org/copyright">OpenStreetMap</a>') %>% 
  setView(-81.655210, 30.324303, zoom = 8) %>% 
  addCircles(~lon, ~lat, popup=dunkin$type, weight = 3, radius=40, 
                 color="#ffa500", stroke = TRUE, fillOpacity = 0.8) 

m

Why stop there?

Let’s bring in some competition.

starbucks <- read.csv("data/starbucks.csv", stringsAsFactors=F)

datatable(head(starbucks))

The data is structured a bit differently, but at least it has type and location data.

Also, let’s switch from addCircles to addCircleMarkers.

# isolating just the 3 columns we're interested in-- type, lat, and lon
sb_loc <- select(starbucks, type, lat, lon)
dd_loc <- select(dunkin, type, lat, lon)

# joining the two data frames together
ddsb <- rbind(sb_loc, dd_loc)

# creating a coffee color palette

cof <- colorFactor(c("#ffa500", "#13ED3F"), domain=c("Dunkin Donuts", "Starbucks"))
# mapping based on type
m <- leaflet(ddsb) %>% addTiles('http://{s}.basemaps.cartocdn.com/dark_all/{z}/{x}/{y}.png', 
    attribution='Map tiles by <a href="http://stamen.com">Stamen Design</a>, <a href="http://creativecommons.org/licenses/by/3.0">CC BY 3.0</a> &mdash; Map data &copy; <a href="http://www.openstreetmap.org/copyright">OpenStreetMap</a>') %>% 
  setView(-81.655210, 30.324303, zoom = 8) %>% 
  addCircleMarkers(~lon, ~lat, popup=ddsb$type, weight = 3, radius=4, 
                 color=~cof(type), stroke = F, fillOpacity = 0.5) 

m

Play around with the slippy map. Interesting, right?

The file size is only 1.3 m even though there are nearly 19,000 points on the map.

Still, that’s a lot of points to process. It’s probably making your computer fan spin.

Add a legend

Let’s add a legend with the function addLegend() and options for where to place it and colors and labels.

m <- leaflet(ddsb) %>% addTiles('http://{s}.basemaps.cartocdn.com/dark_all/{z}/{x}/{y}.png', 
    attribution='Map tiles by <a href="http://stamen.com">Stamen Design</a>, <a href="http://creativecommons.org/licenses/by/3.0">CC BY 3.0</a> &mdash; Map data &copy; <a href="http://www.openstreetmap.org/copyright">OpenStreetMap</a>') %>% 
  setView(-81.655210, 30.324303, zoom = 8) %>% 
  addCircleMarkers(~lon, ~lat, popup=ddsb$type, weight = 3, radius=4, 
                 color=~cof(type), stroke = F, fillOpacity = 0.5)  %>%
  addLegend("bottomright", colors= c("#ffa500", "#13ED3F"), labels=c("Dunkin'", "Starbucks"), title="Coffee places")


m

Mapping the end of the universe

According to comedian Lewis Black, the End of the Universe is in Houston, where there’s a Starbucks across the street from another Starbucks.

Let’s see if this is true and let’s look for 10 in all.

Here’s another loop and instead of comparing SB locations to Dunkin locations, we’ll compare SB locations to other SB locations.

Like before, the code to do so is below, but you can skip ahead to the next chunk if you don’t want to wait ~2 hours for the data to process and loop.

# Creating a loop to go through the Starbucks dataframe and compare it itself
# Going through each row of SB lat and lon and finding/keeping the SB lat/lon with the shortest distance to it

# First, set up some temp columns
sb_loc$sb_lat <- 0
sb_loc$sb_lon <- 0
sb_loc$feet <- 0
sb_loc$string_check <- paste(sb_loc$lat, sb_loc$lon)

# Now the loop
for (i in 1:nrow(sb_loc)) {
  print(paste0(i, " of ", nrow(sb_loc)))
# Looping through the SB dataframe

  # slicing out each row
  sb_loc_row <- subset(sb_loc[i,])
  
  # Filtering out the sliced out row so it doesn't measure against itself
  sb_loc_compare <- subset(sb_loc, string_check!=sb_loc_row$string_check[1])
  
  # Looping through the new SB dataframe
  for (x in 1:nrow(sb_loc_compare)) {

    # Using the spDistsN1 function which is a little weird because it
    #  only works if the lat lon pairs being measured are in a matrix
    to_measure_sb <- matrix(c(sb_loc_row$lon[1], sb_loc_compare$lon[x], sb_loc_row$lat[1], sb_loc_compare$lat[x]), ncol=2)
    # Comparing the entire matrix to a single row in the matrix
    km <- spDistsN1(to_measure_sb, to_measure_sb[1,], longlat=TRUE)
    # We only care about the second result sine the first result is always zero
    km <- km[2]
    
    # Converting kilometers to feet
    feet <- round(km*1000/.3048,2)
    
    # These if statements replace the current SB lat and lon and feet variables 
    #  with the first results but replaces that if
    #  the feet value is smaller than what's currently in it
    if (x==1) {
      sb_loc_row$sb_lat <- sb_loc_compare$lat[x]
      sb_loc_row$sb_lon <- sb_loc_compare$lon[x]
      sb_loc_row$feet <- feet
      sb_loc_row$sb_name <- sb_loc_compare$string_check[x]
    } else {
      if (feet < sb_loc_row$feet) {
        sb_loc_row$sb_lat <- sb_loc_compare$lat[x]
        sb_loc_row$sb_lon <- sb_loc_compare$lon[x]
        sb_loc_row$feet <- feet
        sb_loc_row$sb_name <- sb_loc_compare$string_check[x]
        }
    }
  }
  
  # This is rebuilding the dataframe row by row with the new SB dataframe values
  if (i==1) {
    sb_distances <- sb_loc_row
  } else {
    sb_distances <- rbind(sb_distances, sb_loc_row)
  }
}

# sb_distances <- unique...
write.csv(sb_distances, "data/sb_distances.csv")

Alright, map it.

# Bringing in the dataframe because I don't want to make you wait through a loop
sb_distances <- read.csv("data/sb_distances.csv")

# Arranging and filtering just the 10 locations with the shortest distances
sb_10 <- sb_distances %>%
  arrange(feet) %>%
  filter(feet > 60) %>%
  head(40)

sb_solo <- select(sb_10, lat, lon, feet)
sb_solo2 <- select(sb_10, sb_lat, sb_lon, feet)
colnames(sb_solo2) <- c("lat", "lon", "feet")

sb_again <- rbind(sb_solo, sb_solo2)

# Mapping it
m <- leaflet(sb_again) %>% addTiles('http://{s}.basemaps.cartocdn.com/dark_all/{z}/{x}/{y}.png', 
    attribution='Map tiles by <a href="http://stamen.com">Stamen Design</a>, <a href="http://creativecommons.org/licenses/by/3.0">CC BY 3.0</a> &mdash; Map data &copy; <a href="http://www.openstreetmap.org/copyright">OpenStreetMap</a>') %>% 
  setView(-98.483330, 38.712046, zoom = 4) %>% 
  addCircleMarkers(~lon, ~lat, popup=sb_again$feet, weight = 3, radius=4, 
                 color="#13ED3F", stroke = F, fillOpacity = 0.5)  %>%
  addLegend("bottomright", colors= "#13ED3F", labels="Starbucks", title="End of the Universe")


m

Making choropleths

To make a choropleth map, you first need a shapefile or geojson of the polygons that you’re filling in.

Importing shapefiles from the Census

You could download and import the shapefile into R yourself, but there’s a package that brings in Census shapefiles for you called Tigris.

This is what the U.S. states looks like on in leaflet R.

# Polygon stuff from shape file
# install.packages("tigris")
library(tigris)

states <- states(cb=T)

# Let's quickly map that out
states %>% leaflet() %>% addTiles() %>% addPolygons(popup=~NAME)

This is how it looks raw. The Census shape files also include territories.

When mapping, we’ll have to remember to exclude them if they show up.

Joining data to a shapefile

Let’s make a choropleth map based on number of Starbucks per state

# First, we'll use dplyr to summarize the data
# count by state
sb_state <- starbucks %>%
  group_by(Province) %>%
  summarize(total=n())

# Some quick adjustments to the the dataframe to clean up names
sb_state$type <- "Starbucks"
colnames(sb_state) <- c("state", "total", "type")

# Now we use the Tigris function geo_join to bring together 
# the states shapefile and the sb_states dataframe -- STUSPS and state 
# are the two columns they'll be joined by
states_merged_sb <- geo_join(states, sb_state, "STUSPS", "state")

# Creating a color palette based on the number range in the total column
pal <- colorNumeric("Greens", domain=states_merged_sb$total)

# Getting rid of rows with NA values
states_merged_sb <- subset(states_merged_sb, !is.na(total))

# Setting up the pop up text
popup_sb <- paste0("Total: ", as.character(states_merged_sb$total))

# Mapping it with the new tiles CartoDB.Positron
leaflet() %>%
  addProviderTiles("CartoDB.Positron") %>%
  setView(-98.483330, 38.712046, zoom = 4) %>% 
  addPolygons(data = states_merged_sb , 
              fillColor = ~pal(states_merged_sb$total), 
              fillOpacity = 0.7, 
              weight = 0.2, 
              smoothFactor = 0.2, 
              popup = ~popup_sb) %>%
  addLegend(pal = pal, 
            values = states_merged_sb$total, 
            position = "bottomright", 
            title = "Starbucks")

Hmm… Not that interesting, right?

What’s the problem here. You know what’s wrong.

This is essentially a population map.

So we need to adjust for population.

And that’s easy to do using the Census API.

Bringing in Census data via API

We’ll use the censusapi package created by journalist Hannah Recht.

# install.packages("devtools")
# devtools::install_github("hrecht/censusapi")

library(censusapi)

## 
## Attaching package: 'censusapi'

## The following object is masked from 'package:methods':
## 
##     getFunction

# Pulling in the key.R script that has my census api key. 
# It will be disabled after this weekend so get your own
# http://api.census.gov/data/key_signup.html

source("key.R") 

# We won't go over all the functions, but uncomment the lines below to see 
# the available variables 
# vars2015 <- listCensusMetadata(name="acs5", vintage=2015, "v")
# View(vars2015)

# Alright, getting total population by state from the API
state_pop <-  getCensus(name="acs5", 
                        vintage=2015,
                        key=census_key, 
                        vars=c("NAME", "B01003_001E"), 
                        region="state:*")

datatable(head(state_pop))

# Cleaning up the column names
colnames(state_pop) <- c("NAME", "state_id", "population")
state_pop$state_id <- as.numeric(state_pop$state_id)
# Hm, data comes in numbers of fully spelled out, not abbreviations

# Did you know R has its own built in list of State names and State abbreviations?
# Just pull it in this way to create a dataframe for reference

state_off <- data.frame(state.abb, state.name)

# So I needed to create the dataframe above because the Census API data 
# gave me states with full names while the Starbucks data came with abbreviated state names
# So I needed a relationship dataframe so I could join the two

# Cleaning up the names for easier joining
colnames(state_off) <- c("state", "NAME")

# Joining state population dataframe to relationship file
state_pop <- left_join(state_pop, state_off)

## Joining, by = "NAME"

## Warning in left_join_impl(x, y, by$x, by$y, suffix$x, suffix$y): joining
## factor and character vector, coercing into character vector

# The relationship dataframe didnt have DC or Puerto Rico, so I'm manually putting those in
state_pop$state <- ifelse(state_pop$NAME=="District of Columbia", "DC", as.character(state_pop$state))
state_pop$state <- ifelse(state_pop$NAME=="Puerto Rico", "PR", as.character(state_pop$state))

# Joining Starbucks dataframe to adjusted state population dataframe
sb_state_pop <- left_join(sb_state, state_pop)

## Joining, by = "state"

# Calculating per Starbucks stores 100,000 residents and rounding to 2 digits
sb_state_pop$per_capita <- round(sb_state_pop$total/sb_state_pop$population*100000,2)

# Eliminating rows with NA
sb_state_pop <- subset(sb_state_pop, !is.na(per_capita))
datatable(head(sb_state_pop))

Final map

states_merged_sb_pc <- geo_join(states, sb_state_pop, "STUSPS", "state")

pal_sb <- colorNumeric("Greens", domain=states_merged_sb_pc$per_capita)
states_merged_sb_pc <- subset(states_merged_sb_pc, !is.na(per_capita))

popup_sb <- paste0("Per capita: ", as.character(states_merged_sb_pc$per_capita))

leaflet() %>%
  addProviderTiles("CartoDB.Positron") %>%
  setView(-98.483330, 38.712046, zoom = 4) %>% 
  addPolygons(data = states_merged_sb_pc , 
              fillColor = ~pal_sb(states_merged_sb_pc$per_capita), 
              fillOpacity = 0.9, 
              weight = 0.2, 
              smoothFactor = 0.2, 
              popup = ~popup_sb) %>%
  addLegend(pal = pal_sb, 
            values = states_merged_sb_pc$per_capita, 
            position = "bottomright", 
            title = "Starbucks<br />per 100,000<br/>residents")

Final map II

The Leaflet for R package was recently updated, adding functionality like highlighting polygons and labels on hover.

This options include highlight and labelOptions. Read more at rstudio.

states_merged_sb_pc <- geo_join(states, sb_state_pop, "STUSPS", "state")

pal_sb <- colorNumeric("Greens", domain=states_merged_sb_pc$per_capita)
states_merged_sb_pc <- subset(states_merged_sb_pc, !is.na(per_capita))

popup_sb <- paste0("Per capita: ", as.character(states_merged_sb_pc$per_capita))

leaflet() %>%
  addProviderTiles("CartoDB.Positron") %>%
  setView(-98.483330, 38.712046, zoom = 4) %>% 
  addPolygons(data = states_merged_sb_pc , 
              fillColor = ~pal_sb(states_merged_sb_pc$per_capita), 
              fillOpacity = 0.9, 
              weight = 0.2, 
              smoothFactor = 0.2,
              highlight = highlightOptions(
                  weight = 5,
                  color = "#666",
                  dashArray = "",
                  fillOpacity = 0.7,
                   bringToFront = TRUE),
              label=popup_sb,
              labelOptions = labelOptions(
    style = list("font-weight" = "normal", padding = "3px 8px"),
    textsize = "15px",
    direction = "auto")) %>%
  addLegend(pal = pal_sb, 
            values = states_merged_sb_pc$per_capita, 
            position = "bottomright", 
            title = "Starbucks<br />per 100,000<br/>residents")

Mapping with R

Andrew Ba Tran

NICAR 2017