Using acs.R to create choropleth maps

Posted by Ezra Glenn on July 15, 2013
Census, Code, Maps

Some time ago, FlowingData issued a challenge to create choropleth maps with open source tools, resulting in some nice little scripts using R’s ggplot2 and maps packages — all nicely covered on the Revolution Analytics blog. Some users have recently asked whether the acs package can be used to create similar maps, and the answer (of course) is yes. Here’s how.

For starters, to manage expectations, keep in mind that the map_data() function, which actually generates the geographic data you need to plot maps, does not currently provide boundary data for very many Census geographies — so sadly a lot of the new expanded support for various Census summary levels built into the acs package can’t be used. What you can do, however, is very easily plot state and county maps, which is what we’ll showcase below.

Secondly, the statistician in me feels compelled to point out one problem with using the acs package for these sorts of maps: choropleth maps are really fun to use to quickly show off the geographic distribution of some key statistic, but there is a price: in order to plot the data, we are really limited to a single number for each polygon. As a result, we can plot estimates from the ACS, but only if we are willing to ignore the margins of errors, and pretend that the data is less “fuzzy” than it really is. Given that the whole point of the acs package was to call attention to standard errors, and provide tools to work with them, it seems sort of counter-intuitive to use the package in this way — but given the ease of downloading ACS through the acs.fetch() function, it may still be a good (although slightly irresponsible) use of the package.

Given all that, I’ll step back down off my high horse, and show you how to make some maps, drawing heavily on the scripts provided by Hadley Wickham for using the ggplot2 package. For this example, let’s look at the percentage of people who take public transportation to work in the U.S., by county.

For starters, we’ll need to install and load the required packages:

> install.packages("acs")
> install.packages("ggplot2")
> install.packages("maps")
> library(acs)
> library(ggplot2)
> library(maps)

If you haven’t already obtained and installed an API key from the Census, you’ll need to do that as well — see ?api.key.install or check section 3.3 in the user guide.

Next, we use the map_data function to create some map boundary files to use later.

# load the boundary data for all counties
> county.df=map_data("county")
# rename fields for later merge
> names(county.df)[5:6]=c("state","county")
> state.df=map_data("state")

Turning to the acs package, we create a new geo.set consisting of all the tracts for all the state, and in a single call to acs.fetch download Census table B08301, which contains data on Mean of Transportation to Work. (If we didn’t know the table number we needed, we could use acs.lookup() to search for likely candidates, or even pass search strings directly to the acs.fetch() function.)

# wow! a single geo.set to hold all the counties...?
> us.county=geo.make(state="*", county="*")
# .. and a single command to fetch the data...!
> us.transport=acs.fetch(geography=us.county, 
     table.number="B08301", col.names="pretty")

The data we’ve fetched includes estimates of raw numbers of workers, not percentages, so we’ll need to do some division. Since we are interested in a proportion (and not a ratio), we to use the divide.acs function, not just “/”.1 For our dataset, the 10th column is the number of workers taking public transportation to work, and the 1st column is the total number of workers in the county. (See acs.colnames(us.transport) to verify this.) After we complete the division, we extract the estimates into a new data.frame, along with state and county names for each. (We need to do a little string manipulation to make these fields match with those from the map_data function above.)

> us.pub.trans=divide.acs(numerator=us.transport[,10], 
     denominator=us.transport[,1], method="proportion")
> pub.trans.est=data.frame(county=geography(us.pub.trans)[[1]], 
     percent.pub.trans=as.numeric(estimate(us.pub.trans)))
# this next step is all for Louisiana!
> pub.trans.est$county=gsub("Parish", "County", pub.trans.est$county)
# clean up county names and find the states
> pub.trans.est$state=tolower(gsub("^.*County, ", "", pub.trans.est$county))
> pub.trans.est$county=tolower(gsub(" County,.*", "", pub.trans.est$county))

Next, following Wickham’s script, we merge the boundaries with the data into a new data.frame (called choropleth) and reorder and recode it for out map levels.

> choropleth=merge(county.df, pub.trans.est, by=c("state","county"))
> choropleth=choropleth[order(choropleth$order), ]
> choropleth$pub.trans.rate.d=cut(choropleth$percent.pub.trans, 
     breaks=c(0,.01,.02,.03,.04,.05,.1,1), include.lowest=T)

And voila – a single call to ggplot and we have our map!

> ggplot(choropleth, aes(long, lat, group = group)) +
     geom_polygon(aes(fill = pub.trans.rate.d), colour = "white", size = 0.2) + 
     geom_polygon(data = state.df, colour = "white", fill = NA) +
     scale_fill_brewer(palette = "Purples")

./acs_scripts/choropleth_county_pub_trans.jpg

(For those who would like to see the entire script in all its efficient 14-line glory, I’ve pasted it below for easy cutting and pasting.)

install.packages("acs")
install.packages("ggplot2")
install.packages("maps")
library(acs)
library(ggplot2)
library(maps)
county.df=map_data("county")
names(county.df)[5:6]=c("state","county")
state.df=map_data("state")
us.county=geo.make(state="*", county="*")
us.transport=acs.fetch(geography=us.county, 
     table.number="B08301", col.names="pretty")
us.pub.trans=divide.acs(numerator=us.transport[,10], 
     denominator=us.transport[,1], method="proportion")
pub.trans.est=data.frame(county=geography(us.pub.trans)[[1]], 
     percent.pub.trans=as.numeric(estimate(us.pub.trans)))
pub.trans.est$county=gsub("Parish", "County", pub.trans.est$county)
pub.trans.est$state=tolower(gsub("^.*County, ", "", pub.trans.est$county))
pub.trans.est$county=tolower(gsub(" County,.*", "", pub.trans.est$county))
choropleth=merge(county.df, pub.trans.est, by=c("state","county"))
choropleth=choropleth[order(choropleth$order), ]
choropleth$pub.trans.rate.d=cut(choropleth$percent.pub.trans, 
     breaks=c(0,.01,.02,.03,.04,.05,.1,1), include.lowest=T)
ggplot(choropleth, aes(long, lat, group = group)) +
     geom_polygon(aes(fill = pub.trans.rate.d), colour = "white", size = 0.2) + 
     geom_polygon(data = state.df, colour = "white", fill = NA) +
     scale_fill_brewer(palette = "Purples")

wpid-choropleth\_county\_pub\_trans1.jpg

Footnotes:

1 Technically, since we are going to ignore the standard errors in our map, this could just be a standard division using “/”, but we might later want to look at the margins of error, etc. (For more on this issue, see ?divide.acs.)

Tags: , , , , ,

Leave a Reply