Using acs.R for a t-test

Posted by Ezra Glenn on November 06, 2013
Census / No Comments

I’ve been asked to provide a very quick example of using the acs.R package to conduct a t-test of significance when comparing ACS data from two different geographical areas—so here goes: a quick example.

Let’s look at the number of school-age children in different towns on Martha’s Vineyard. There are seven towns on this island, and luckily (like all New England towns) they are each represented as a different “county subdivision;” in this case, they together make up Dukes County. So we can get some quick age demographics for all seven of them in two quick commands:

> towns=geo.make(state="MA", county="Dukes", county.subdivision="*")
>, table.number="B01001","pretty")
> # one more step just to shorten geography -- just town names
> geography([,1]=str_replace(
      geography([,1]," town.*","")

If you look at the column names using acs.colnames(, you will see that we are most interested in columns 4-6 (male children age 5-17) and 28-30 (female children, same ages). We also might need column 1 (the total population for the town), for the purpose of calculating percentages later on.

Continue reading…

Urban Planning and the Free Software Movement: A Conversation

Posted by Ezra Glenn on November 05, 2013
Government, Open-Source / No Comments

Join the MIT Open Source Planning Tools Groups and Zak Rogoff, Campaigns Manager at the Free Software Foundation, to learn about new developments in the world of free software – and discover how you can be part of the movement to create, perfect, and promote the use of free and open tools for urban planners (and others) to use. Co-sponsored by LibrePlanet Boston. Open to: the general public.

Time: 6:00p–8:00p Location: MIT Room 7-429

Census API Back In Action: acs.R is once again fully functional

Posted by Ezra Glenn on October 18, 2013
Census, Code / No Comments

Now that the federal government is back on its feet, the Census API is serving up data again, and the acs.R package should work like it was supposed to.

For example:

> acs.fetch(geo=geo.make(state=25), table.number="B01003")
 2007 -- 2011 ;
  Estimates w/90% confidence intervals;
  for different intervals, see confint()
Massachusetts 6512227 +/- 0

Census API Down Due to Breakdown of Federal Government

Posted by Ezra Glenn on October 02, 2013
Census, Code / No Comments

Due to the shutdown of the federal government, it appears that many federal websites are down as well, including the Census API (see As a result, the acs.R package is currently unable to download data – sorry! If you to use acs.fetch or anything related that requires the API, you will probably get the following error:

> acs.fetch(geo=geo.make(state=25), table.number="B01003")
Error in file(file, "rt") : cannot open the connection

I assume that once this all gets sorted out, the site will come right back as before.

acs.R Webinar: now online

Posted by Ezra Glenn on September 04, 2013
Census / No Comments

For those who were unable to attend the recent acs.R webinar (or if you fell asleep halfway through), we’ve posted the complete video, the presentation slides, and the webinar demo script. Enjoy, and thanks again to Ray DiGiacomo and the Orange County R User for hosting this event.

acs.R Webinar: Aug 29

Posted by Ezra Glenn on August 07, 2013
Census / No Comments

On August 29, 2013 (1:00PM Eastern/10:00AM Pacific) I’ll be presenting a free webinar on using the acs package, hosted by the Orange County R User Group. This one-hour on-line event will provide a general introduction to the acs package, including a live demonstration and time for Q&A.


  • Speaker: Ezra Haber Glenn, Lecturer at the Massachusetts Institute of Technology; maintainer, acs.R package
  • Moderator: Ray DiGiacomo, President of The Orange County R User Group
  • Registration (free):

Please note that in addition to attending from your laptop or desktop computer, you can also attend from a Wi-Fi connected iPhone, iPad, Android phone or Android tablet by installing the GoToMeeting App.

Using acs.R to create choropleth maps

Posted by Ezra Glenn on July 15, 2013
Census, Code, Maps / No Comments

Some time ago, FlowingData issued a challenge to create choropleth maps with open source tools, resulting in some nice little scripts using R’s ggplot2 and maps packages — all nicely covered on the Revolution Analytics blog. Some users have recently asked whether the acs package can be used to create similar maps, and the answer (of course) is yes. Here’s how.

For starters, to manage expectations, keep in mind that the map_data() function, which actually generates the geographic data you need to plot maps, does not currently provide boundary data for very many Census geographies — so sadly a lot of the new expanded support for various Census summary levels built into the acs package can’t be used. What you can do, however, is very easily plot state and county maps, which is what we’ll showcase below.

Secondly, the statistician in me feels compelled to point out one problem with using the acs package for these sorts of maps: choropleth maps are really fun to use to quickly show off the geographic distribution of some key statistic, but there is a price: in order to plot the data, we are really limited to a single number for each polygon. As a result, we can plot estimates from the ACS, but only if we are willing to ignore the margins of errors, and pretend that the data is less “fuzzy” than it really is. Given that the whole point of the acs package was to call attention to standard errors, and provide tools to work with them, it seems sort of counter-intuitive to use the package in this way — but given the ease of downloading ACS through the acs.fetch() function, it may still be a good (although slightly irresponsible) use of the package.

Given all that, I’ll step back down off my high horse, and show you how to make some maps, drawing heavily on the scripts provided by Hadley Wickham for using the ggplot2 package. For this example, let’s look at the percentage of people who take public transportation to work in the U.S., by county.

For starters, we’ll need to install and load the required packages:

> install.packages("acs")
> install.packages("ggplot2")
> install.packages("maps")
> library(acs)
> library(ggplot2)
> library(maps)

If you haven’t already obtained and installed an API key from the Census, you’ll need to do that as well — see ?api.key.install or check section 3.3 in the user guide.

Next, we use the map_data function to create some map boundary files to use later.

# load the boundary data for all counties
> county.df=map_data("county")
# rename fields for later merge
> names(county.df)[5:6]=c("state","county")
> state.df=map_data("state")

Turning to the acs package, we create a new geo.set consisting of all the tracts for all the state, and in a single call to acs.fetch download Census table B08301, which contains data on Mean of Transportation to Work. (If we didn’t know the table number we needed, we could use acs.lookup() to search for likely candidates, or even pass search strings directly to the acs.fetch() function.)

# wow! a single geo.set to hold all the counties...?
> us.county=geo.make(state="*", county="*")
# .. and a single command to fetch the data...!
> us.transport=acs.fetch(geography=us.county, 
     table.number="B08301", col.names="pretty")

The data we’ve fetched includes estimates of raw numbers of workers, not percentages, so we’ll need to do some division. Since we are interested in a proportion (and not a ratio), we to use the divide.acs function, not just “/”.1 For our dataset, the 10th column is the number of workers taking public transportation to work, and the 1st column is the total number of workers in the county. (See acs.colnames(us.transport) to verify this.) After we complete the division, we extract the estimates into a new data.frame, along with state and county names for each. (We need to do a little string manipulation to make these fields match with those from the map_data function above.)

     denominator=us.transport[,1], method="proportion")
> pub.trans.est=data.frame(county=geography([[1]],
# this next step is all for Louisiana!
> pub.trans.est$county=gsub("Parish", "County", pub.trans.est$county)
# clean up county names and find the states
> pub.trans.est$state=tolower(gsub("^.*County, ", "", pub.trans.est$county))
> pub.trans.est$county=tolower(gsub(" County,.*", "", pub.trans.est$county))

Next, following Wickham’s script, we merge the boundaries with the data into a new data.frame (called choropleth) and reorder and recode it for out map levels.

> choropleth=merge(county.df, pub.trans.est, by=c("state","county"))
> choropleth=choropleth[order(choropleth$order), ]
> choropleth$pub.trans.rate.d=cut(choropleth$, 
     breaks=c(0,.01,.02,.03,.04,.05,.1,1), include.lowest=T)

And voila – a single call to ggplot and we have our map!

> ggplot(choropleth, aes(long, lat, group = group)) +
     geom_polygon(aes(fill = pub.trans.rate.d), colour = "white", size = 0.2) + 
     geom_polygon(data = state.df, colour = "white", fill = NA) +
     scale_fill_brewer(palette = "Purples")


(For those who would like to see the entire script in all its efficient 14-line glory, I’ve pasted it below for easy cutting and pasting.)

us.county=geo.make(state="*", county="*")
     table.number="B08301", col.names="pretty")[,10], 
     denominator=us.transport[,1], method="proportion")
pub.trans.est$county=gsub("Parish", "County", pub.trans.est$county)
pub.trans.est$state=tolower(gsub("^.*County, ", "", pub.trans.est$county))
pub.trans.est$county=tolower(gsub(" County,.*", "", pub.trans.est$county))
choropleth=merge(county.df, pub.trans.est, by=c("state","county"))
choropleth=choropleth[order(choropleth$order), ]
     breaks=c(0,.01,.02,.03,.04,.05,.1,1), include.lowest=T)
ggplot(choropleth, aes(long, lat, group = group)) +
     geom_polygon(aes(fill = pub.trans.rate.d), colour = "white", size = 0.2) + 
     geom_polygon(data = state.df, colour = "white", fill = NA) +
     scale_fill_brewer(palette = "Purples")



1 Technically, since we are going to ignore the standard errors in our map, this could just be a standard division using “/”, but we might later want to look at the margins of error, etc. (For more on this issue, see ?divide.acs.)

Tags: , , , , ,

acs.R version 1.1: PUMAs and Zip Codes and MSAs, Oh My!

Posted by Ezra Glenn on July 14, 2013
Census, Code, Self-promotion / No Comments

Development continues on the acs package for R, with the latest update (version 1.1) now officially available on the CRAN repository. If you’ve already installed the package in the past, you can easily update with the update.packages() command; if you’ve never installed it, you can just as easily install it for the first time, by simply typing install.packages(“acs”). In either case, be sure to load the library after installing by typing library(acs), and install (or re-install) an API key with api.key.install() — see the documentation and the latest version of the acs user guide (which still references version 1.0).

Beyond improvements described in a previous post about version 1.0, the most significant change in the latest version is support for many more different combinations of census geography via the geo.make function. As described in the manual and on-line help, users can now specify options to create user-defined geographies composed of combinations of states, counties, county subdivisions, tracts, places, blockgroups (all available in the previous version), plus many more: public use microdata areas (PUMAs), metropolitan statistical areas (MSAs), combined statistical areas (CSAs), zip code tabulation areas, census regions and divisions, congressional district and state legislative districts (both upper and lower chambers), American Indian Areas, state school districts (of various types), New England County and Town Areas (NECTAs), and census urban areas. These geographies can be combined to create 25 different census summary levels, which can then even be bundled together to make even more complex geo.sets.

Once created and saved, these new user-defined geo.sets can be fed into the existing acs.fetch function to immediately download data from the ACS for these areas, combining them as desired in the process (and handling all those pesky estimates and margins of error in statistically-appropriate ways.)

We encourage you to update to the latest version and begin to explore the full power of the census data now available through the Census American Community Survey API. (And be sure to subscribe to the acs.R user group mailing list to be informed of future improvements.

Tags: , , , , ,

acs.R example: downloading all the tracts in a county or state

Posted by Ezra Glenn on July 03, 2013
Census, Code / No Comments

An acs.R user asks:

> How do I use acs to download all the census tracts? is there
> some handy command to do that?

Here’s some help:

All the tracts in a single county

You can’t automatically download all the tracts for the whole country (or even for an entire state) in a single step (but see below for ways to do this). If you just need all the tracts in a single county, it’s really simple — just use the “*” wildcard for the tract number when creating your geo.set.

The example below creates a geo.set for all the tracts in Middlesex County, Massachusetts, and then downloads data from ACS table B01003 on Total Population for them.

> my.tracts=geo.make(state="MA", county="Middlesex", tract="*") 
> acs.fetch(geography=my.tracts, table.number="B01003")

All the tracts in a state

If you happen to have a vector list of the names (or FIPS codes) of all the counties in a given state (or the ones you want), you could do something like this to get all the tracts in each of them:

> all.tracts=geo.make(state="MA", county=list.of.counties, 
> acs.fetch(geography=all.tracts, table.number="B01003")

As an added bonus, if you don’t happen to have a list of counties, but want to use the package to get one, you could do something like this:

> mass=acs.fetch(geography=geo.make(state=25, county="*"), 

#  mass is now a new acs object with data for each county in
#  Massachusetts.  The "geography" function returns a dataframe of the
#  geographic metadata, which includes FIPS codes as the third
#  column.  So you can use it like this:

> all.tracts=geo.make(state="MA", 
  tract="*", check=T)
> acs.fetch(geography=all.tracts, table.number="B01003")

All the tracts in the entire country

In theory, you could even use this to get all the tracts from all the 3,225 counties in the country:

> all.counties=acs.fetch(geography=geo.make(state="*", county="*"),
> all.tracts=geo.make(state=as.numeric(geography(all.counties)[[2]]),,
  county=as.numeric(geography(all.counties)[[3]]), tract="*", check=T)

Unfortunately (or perhaps fortunately), this is just too much for R to download without changing some of the internal variables that limit this sort of thing — if you try, R will complain with “Error: evaluation nested too deeply: infinite recursion…” To prove to yourself that it works, you could limit the number of counties to just the first 250, and try that — it will get you from Autauga County, Alabama to Bent County, Colorado.

> some.counties=all.counties[1:250]
> some.tracts=geo.make(state=as.numeric(geography(some.counties)[[2]]), 
  county=as.numeric(geography(some.counties)[[3]]), tract="*", check=T)
>, table.number="B01003")

This is really a lot of data — on my machine, this took about 18 seconds, resulting in a new acs object containing population data on 11,872 different tracts. I haven’t checked to see what the upper limits are, but I imagine it wouldn’t take much to figure out a way to get tract-level data from all 3,225 counties. (But remember: with great power comes great responsibility — don’t be too rough on downloading stuff from the Census, even if it is free and easy.)

Using the built-in FIPS data

An alternative approach to these last two examples would be to use the FIPS datasets that we’ve built-in to the acs.R package. For example, the “fips.county” dataset includes the names of each county, by state. Feed this (or part of this) to your geo.make command and you can do all sorts of neat things.

> head(fips.county)
  State State.ANSI County.ANSI    County.Name ANSI.Cl
1    AL          1           1 Autauga County      H1
2    AL          1           3 Baldwin County      H1
3    AL          1           5 Barbour County      H1
4    AL          1           7    Bibb County      H1
5    AL          1           9  Blount County      H1
6    AL          1          11 Bullock County      H1

So instead of the last block above, you could do something like this:

> random.counties=sample(x=3225,size=20, replace=F)
> some.tracts=geo.make(state=fips.county[random.counties,1], 
  county=fips.county[random.counties,3], tract="*", check=T)
Testing geography item 1: Tract *, Ponce Municipio, Puerto Rico .... OK.
Testing geography item 2: Tract *, Alleghany County, North Carolina .... OK.
Testing geography item 3: Tract *, Wayne County, Pennsylvania .... OK.
Testing geography item 4: Tract *, Comerio Municipio, Puerto Rico .... OK.
Testing geography item 5: Tract *, Lafayette County, Wisconsin .... OK.
Testing geography item 6: Tract *, Hartford County, Connecticut .... OK.
Testing geography item 7: Tract *, Real County, Texas .... OK.
Testing geography item 8: Tract *, Costilla County, Colorado .... OK.
Testing geography item 9: Tract *, Sarpy County, Nebraska .... OK.
Testing geography item 10: Tract *, McLennan County, Texas .... OK.
Testing geography item 11: Tract *, Donley County, Texas .... OK.
Testing geography item 12: Tract *, McIntosh County, Georgia .... OK.
Testing geography item 13: Tract *, Chilton County, Alabama .... OK.
Testing geography item 14: Tract *, Richland County, Montana .... OK.
Testing geography item 15: Tract *, Mitchell County, Kansas .... OK.
Testing geography item 16: Tract *, Muscogee County, Georgia .... OK.
Testing geography item 17: Tract *, Martin County, Indiana .... OK.
Testing geography item 18: Tract *, Naguabo Municipio, Puerto Rico .... OK.
Testing geography item 19: Tract *, Aguas Buenas Municipio, Puerto Rico .... OK.
Testing geography item 20: Tract *, Washington County, Arkansas .... OK.

> # you may get different counties in your random set
> acs.fetch(geography=some.tracts, table.number="B01003")

Which will return population data from all the tracts in a random set of 20 counties.

Tags: , , , ,

Now on CRAN: acs.R version 1.0

Posted by Ezra Glenn on June 25, 2013
Census, Code, Self-promotion / No Comments

We are pleased to announce that the acs.R package is now ready for prime-time: version 1.0 was officially released last week and is now available on CRAN.1 This version, developed in partnership with the Puget Sound Regional Council, includes all the enhancements described in this post, plus additional tweaks, and lots of documentation.

Just to recap, as of version 1.0:

  • The package is now capable of downloading data directly from the new Census American Community Survey API and importing into R (with proper statistical treatment of estimates and error, variable and geographic relabeling, and more), all through a single “acs.fetch()” function;
  • The package includes a new “geo.make()” function to allow users to create their own custom geographies for organize and download data; and
  • The package provides two special “lookup” tools to help filter through all the existing Census geographies (with the “geo.lookup()” function) and tables (with the “acs.lookup()” function) to find exactly what they want. The acs.lookup function return new “acs.lookup” objects which can be saved, manipulated, and passed to acs.fetch() for downloading data.

I’ve also updated the user guide (version 1.0), which includes step-by-step instructions for working with the package, plus an extended example in the appendix on using blockgroup-level ACS data to create your own neighborhood geographies. (You can also view the complete package manual from the CRAN site.)

Finally, if you’re interested in staying in touch with the ongoing development of the package, be sure to sign up for the acs.R user group mailing list: to register, visit


1 Note: the latest version of this package is actually 1.01, which includes a few additional big-fixes.

Tags: , , , , ,