Presenting acs.R at the ACS Data User Conference

Posted by Ezra Glenn on April 06, 2015
Census, Code / Comments Off on Presenting acs.R at the ACS Data User Conference

On May 12, 2015, I’ll be presenting the acs.R package in a session of the American Community Survey Data User Group Conference in Hyattsville, MD. The paper, titled “Estimates with errors and errors with estimates: Using the R ‘acs’ package for analysis of American Community Survey data,” is available through the SSRN or my faculty publications webpage.

Better yet, the session will also include a presentation by Michael Laviolette, Dennis Holt, and Kristin K. Snow of the State of New Hampshire Department of Health and Human Services on “Using the R Language and ‘acs’ Package to Compile and Update a Social Vulnerability Index for New Hampshire.” It’s great to see how planners are using and extending this package in all sorts of exciting new settings and applications.

Click these links to see the complete program or to register for the conference.

acs.R useRs: Share your success stories

Posted by Ezra Glenn on October 20, 2014
Census, Code, Free Software, Open-Source / Comments Off on acs.R useRs: Share your success stories

Have you been using the acs.R package to download and analyze Census data in your work? Do you have a story you’d be willing to share, to help us promote the package and show off all the cool ways people are using open source tools to make sense of data and help inform communities, policy-makers, and researchers? If so, please let us know: email your news or project descriptions to and we’ll post them here to inform and inspire our readers. Be sure to include good images, news coverage, quotations, or other materials to help tell the story — and feel free to include links, scripts, or examples as well.


PS: Don’t forget to subscribe to the acs.R mailing list to remain in touch with the growing acs.R user community.

acs.R version 1.2: Now, with 2012 data

Posted by Ezra Glenn on January 22, 2014
Census, Code, Self-promotion / No Comments

As some of you have noticed, the new five-year Census ACS data has just come out, and is now available via the Census API. To make sure you are able to fetch the freshest possible data to play with in R, I’ve updated the acs.R package to version 1.2, which now includes full support for the 2008–2012 ACS data.

The latest version is now available on the CRAN repository. If you’ve already installed the package in the past, you can easily update with the update.packages() command; if you’ve never installed it, you can just as easily install it for the first time, by simply typing install.packages(“acs”). In either case, be sure to load the library after installing by typing library(acs), and install (or re-install) an API key with api.key.install() — see the documentation and the latest version of the acs user guide for more info.

To get the latest data, just continue to use the acs.fetch() function as usual, but specify endyear=2012. (By default, endyear is set to 2011 if no year is explicitly passed to acs.fetch, and I didn’t want to change this for fear of breaking existing user scripts. In the future, we might to rethink this, so that it selects the most recent endyear by default. Thoughts?)

(Note: If you’re not sure which version you are using, you can always type packageVersion(“acs”) to find out.)

New choropleth package in R

Posted by Ezra Glenn on January 22, 2014
Census, Code, Free Software, Open-Source / No Comments

A while back I posted a recipe (based on some great examples on the Revolution Analytics blog) showing how to use the acs package in R to create choropleth maps. Now, through the magic of open-source software development — and the hard work of developer Ari Lamstein and the generosity of his employers — this process has gotten even easier: I call your attention to Ari’s new chorolethr package for R.

Ari is a Senior Software Engineer at Trulia, where he works on data science and visualization, primarily related to real estate and housing markets. As part of the company’s “Innovation Week” he developed the choropleth package, moving well beyond the sample scripts to create a powerful suite of mapping functions. With a single command, a user can now generate maps at the state, county, or zip code level, from any of the data available via the ACS.

The package is not yet up on CRAN, but Ari promises that’s in the works; for now, you can learn more about it — including installation instructions using install_github() — on the Trulia Tech + Design blog. (I’m of course proud to note that the acs.R package lies at the foundation of these tools, doing the heavy-lifting of fetching and processing up-to-date data from the American Community Survey — but Ari’s work is already moving beyond these humble roots, allowing users to create choropleth maps of any data they can get their hands on….)

To learn more about the types of projects undertaken by Trulia staffers during Innovation week, see this short video. Congratulations — and thanks — to both Ari and Trulia for helping to drive innovation forward in R and other open source projects.

Regionalization from the Ground Up: Apps Get Towns Working Together

Posted by Ezra Glenn on November 08, 2013
Data, Regionalization / No Comments

Others have written—and I’m sure will continue to write—with more enthusiasm and hyperbole about the ways that new web portals and mobile apps are changing the landscape of public participation and responsive city planning: it seems that we are constantly being showered (or perhaps barraged?) with fun new social media tools to engage citizens and activate urban sharing networks – for everything from reporting graffiti to mapping public murals (yes, the irony is noteworthy), from finding a parking space to avoiding being mugged, and so on. Whether or not these apps will ever wind up being the “game changers” we are often promised remains to be seen, but the level of excitement and activity they are generating is undeniable, especially after so many years of resignation and inattention to urban problems.

That said, despite the energy that has been thrown behind developing—and promoting—these new weapons in our urban information arsenal, one aspect of these tools has been noticeably overlooked: the potential they provide to facilitate regional collaboration between municipalities, an as-of-yet-unfulfilled dream of urban planners in the past century.

For starters, consider the unremarkable case of permits for “open burning” in Massachusetts. Regardless of whether one supports the concept of open-air burning of brush, clippings, forest debris, and agricultural waste (and there are many reasons to oppose the idea), the law in most states still allows this practice, with all sorts of regulations and permitting requirements. In Massachusetts—where, famously, “all politics is local”—it is not surprising to learn that while the State Department of Environmental Protection has established a broad policy framework for the issue, actual permits must be obtained from one on the state’s 351 different municipalities, typically from the local fire chief. (And, of course, 351 different municipalities means 351 different addresses, 351 different forms, 351 different hours of operation, and so on.)

If you’re an old-timer (and chances are, most open-air burners are), this probably doesn’t strike you as all that unusual — just head down to the fire station, grab a cup of coffee, chat with some of the other old-timers, and maybe pick up a permit while your there: it sounds rather civilized, in fact, and quite communal. That said, it is nonetheless a pretty inefficient system: why can’t we just do this from home, via some on-line interface?

Well, if you’re lucky enough to live in Berkshire County, you probably can. Residents in 12 of the county’s towns can visit the Berkshire County Online Application for Open Burning Permits to read the regs and apply in real-time for a permit. The site is simple—crude, even—without any bells or whistles, and the process is still a bit arcane (permits are only available between 8:30 AM and 1:00 PM; if you live in the Town of Dalton “you must first visit the fire station between 8 am and 2 pm and pay a $5 fee for the season,” and so on), but it gets the job done. And more importantly, this single little unassuming website represents a major step in regionalization, breaking down 12 little principalities of permitting power to deliver simpler, more consistent, and more efficient municipal services across the county.

Indeed, one of the real strength of apps and online portals is their potential for scalability: once one agency creates a tool to solve a common problem (such as issuing burn permits), there is little cost to sharing it with others. If it’s done well and widely-adopted, it can even help set the standard for entire urban information networks, which is what we are beginning to see on the other end of the state, with a tool called Commonwealth Connect. This mobile app (originally known as Citizens Connect) was developed by the coders at the City of Boston’s Office of New Urban Mechanics to help empower residents to “be the eyes and ears of the City,” reporting potholes, vandalism, missing street signs, graffiti, and the like. Recognizing that the virtues of this “participatory urbanism” do not stop at the city border, the state’s Community Innovation Challenge Grant Program provided $400,000 in funding to expand the program, which now seamlessly serves over 40 cities and towns in the region.

Stories such as these bring new hope to the vision of metropolitan regionalism. As always, the devil is likely to be in the details of implementation, but by starting small, scaling up, and working incrementally through the challenges of cooperation to improve the delivery of some of these basic services—and in the process, recognizing some cost savings and economies of scale—we are starting to see the inklings of a quiet revolution. And, in time, I expect that this sort of “regionalism from the ground-up” is likely to result in more lasting change than the top-down approaches of the past.

Using acs.R for a t-test

Posted by Ezra Glenn on November 06, 2013
Census / No Comments

I’ve been asked to provide a very quick example of using the acs.R package to conduct a t-test of significance when comparing ACS data from two different geographical areas—so here goes: a quick example.

Let’s look at the number of school-age children in different towns on Martha’s Vineyard. There are seven towns on this island, and luckily (like all New England towns) they are each represented as a different “county subdivision;” in this case, they together make up Dukes County. So we can get some quick age demographics for all seven of them in two quick commands:

> towns=geo.make(state="MA", county="Dukes", county.subdivision="*")
>, table.number="B01001","pretty")
> # one more step just to shorten geography -- just town names
> geography([,1]=str_replace(
      geography([,1]," town.*","")

If you look at the column names using acs.colnames(, you will see that we are most interested in columns 4-6 (male children age 5-17) and 28-30 (female children, same ages). We also might need column 1 (the total population for the town), for the purpose of calculating percentages later on.

Continue reading…

Census API Down Due to Breakdown of Federal Government

Posted by Ezra Glenn on October 02, 2013
Census, Code / No Comments

Due to the shutdown of the federal government, it appears that many federal websites are down as well, including the Census API (see As a result, the acs.R package is currently unable to download data – sorry! If you to use acs.fetch or anything related that requires the API, you will probably get the following error:

> acs.fetch(geo=geo.make(state=25), table.number="B01003")
Error in file(file, "rt") : cannot open the connection

I assume that once this all gets sorted out, the site will come right back as before.

acs.R Webinar: now online

Posted by Ezra Glenn on September 04, 2013
Census / No Comments

For those who were unable to attend the recent acs.R webinar (or if you fell asleep halfway through), we’ve posted the complete video, the presentation slides, and the webinar demo script. Enjoy, and thanks again to Ray DiGiacomo and the Orange County R User for hosting this event.

acs.R Webinar: Aug 29

Posted by Ezra Glenn on August 07, 2013
Census / No Comments

On August 29, 2013 (1:00PM Eastern/10:00AM Pacific) I’ll be presenting a free webinar on using the acs package, hosted by the Orange County R User Group. This one-hour on-line event will provide a general introduction to the acs package, including a live demonstration and time for Q&A.


  • Speaker: Ezra Haber Glenn, Lecturer at the Massachusetts Institute of Technology; maintainer, acs.R package
  • Moderator: Ray DiGiacomo, President of The Orange County R User Group
  • Registration (free):

Please note that in addition to attending from your laptop or desktop computer, you can also attend from a Wi-Fi connected iPhone, iPad, Android phone or Android tablet by installing the GoToMeeting App.

Using acs.R to create choropleth maps

Posted by Ezra Glenn on July 15, 2013
Census, Code, Maps / No Comments

Some time ago, FlowingData issued a challenge to create choropleth maps with open source tools, resulting in some nice little scripts using R’s ggplot2 and maps packages — all nicely covered on the Revolution Analytics blog. Some users have recently asked whether the acs package can be used to create similar maps, and the answer (of course) is yes. Here’s how.

For starters, to manage expectations, keep in mind that the map_data() function, which actually generates the geographic data you need to plot maps, does not currently provide boundary data for very many Census geographies — so sadly a lot of the new expanded support for various Census summary levels built into the acs package can’t be used. What you can do, however, is very easily plot state and county maps, which is what we’ll showcase below.

Secondly, the statistician in me feels compelled to point out one problem with using the acs package for these sorts of maps: choropleth maps are really fun to use to quickly show off the geographic distribution of some key statistic, but there is a price: in order to plot the data, we are really limited to a single number for each polygon. As a result, we can plot estimates from the ACS, but only if we are willing to ignore the margins of errors, and pretend that the data is less “fuzzy” than it really is. Given that the whole point of the acs package was to call attention to standard errors, and provide tools to work with them, it seems sort of counter-intuitive to use the package in this way — but given the ease of downloading ACS through the acs.fetch() function, it may still be a good (although slightly irresponsible) use of the package.

Given all that, I’ll step back down off my high horse, and show you how to make some maps, drawing heavily on the scripts provided by Hadley Wickham for using the ggplot2 package. For this example, let’s look at the percentage of people who take public transportation to work in the U.S., by county.

For starters, we’ll need to install and load the required packages:

> install.packages("acs")
> install.packages("ggplot2")
> install.packages("maps")
> library(acs)
> library(ggplot2)
> library(maps)

If you haven’t already obtained and installed an API key from the Census, you’ll need to do that as well — see ?api.key.install or check section 3.3 in the user guide.

Next, we use the map_data function to create some map boundary files to use later.

# load the boundary data for all counties
> county.df=map_data("county")
# rename fields for later merge
> names(county.df)[5:6]=c("state","county")
> state.df=map_data("state")

Turning to the acs package, we create a new geo.set consisting of all the tracts for all the state, and in a single call to acs.fetch download Census table B08301, which contains data on Mean of Transportation to Work. (If we didn’t know the table number we needed, we could use acs.lookup() to search for likely candidates, or even pass search strings directly to the acs.fetch() function.)

# wow! a single geo.set to hold all the counties...?
> us.county=geo.make(state="*", county="*")
# .. and a single command to fetch the data...!
> us.transport=acs.fetch(geography=us.county, 
     table.number="B08301", col.names="pretty")

The data we’ve fetched includes estimates of raw numbers of workers, not percentages, so we’ll need to do some division. Since we are interested in a proportion (and not a ratio), we to use the divide.acs function, not just “/”.1 For our dataset, the 10th column is the number of workers taking public transportation to work, and the 1st column is the total number of workers in the county. (See acs.colnames(us.transport) to verify this.) After we complete the division, we extract the estimates into a new data.frame, along with state and county names for each. (We need to do a little string manipulation to make these fields match with those from the map_data function above.)

     denominator=us.transport[,1], method="proportion")
> pub.trans.est=data.frame(county=geography([[1]],
# this next step is all for Louisiana!
> pub.trans.est$county=gsub("Parish", "County", pub.trans.est$county)
# clean up county names and find the states
> pub.trans.est$state=tolower(gsub("^.*County, ", "", pub.trans.est$county))
> pub.trans.est$county=tolower(gsub(" County,.*", "", pub.trans.est$county))

Next, following Wickham’s script, we merge the boundaries with the data into a new data.frame (called choropleth) and reorder and recode it for out map levels.

> choropleth=merge(county.df, pub.trans.est, by=c("state","county"))
> choropleth=choropleth[order(choropleth$order), ]
> choropleth$pub.trans.rate.d=cut(choropleth$, 
     breaks=c(0,.01,.02,.03,.04,.05,.1,1), include.lowest=T)

And voila – a single call to ggplot and we have our map!

> ggplot(choropleth, aes(long, lat, group = group)) +
     geom_polygon(aes(fill = pub.trans.rate.d), colour = "white", size = 0.2) + 
     geom_polygon(data = state.df, colour = "white", fill = NA) +
     scale_fill_brewer(palette = "Purples")


(For those who would like to see the entire script in all its efficient 14-line glory, I’ve pasted it below for easy cutting and pasting.)

us.county=geo.make(state="*", county="*")
     table.number="B08301", col.names="pretty")[,10], 
     denominator=us.transport[,1], method="proportion")
pub.trans.est$county=gsub("Parish", "County", pub.trans.est$county)
pub.trans.est$state=tolower(gsub("^.*County, ", "", pub.trans.est$county))
pub.trans.est$county=tolower(gsub(" County,.*", "", pub.trans.est$county))
choropleth=merge(county.df, pub.trans.est, by=c("state","county"))
choropleth=choropleth[order(choropleth$order), ]
     breaks=c(0,.01,.02,.03,.04,.05,.1,1), include.lowest=T)
ggplot(choropleth, aes(long, lat, group = group)) +
     geom_polygon(aes(fill = pub.trans.rate.d), colour = "white", size = 0.2) + 
     geom_polygon(data = state.df, colour = "white", fill = NA) +
     scale_fill_brewer(palette = "Purples")



1 Technically, since we are going to ignore the standard errors in our map, this could just be a standard division using “/”, but we might later want to look at the margins of error, etc. (For more on this issue, see ?divide.acs.)

Tags: , , , , ,