Census

Presenting acs.R at the ACS Data User Conference

Posted by Ezra Glenn on April 06, 2015
Census, Code / Comments Off on Presenting acs.R at the ACS Data User Conference

On May 12, 2015, I’ll be presenting the acs.R package in a session of the American Community Survey Data User Group Conference in Hyattsville, MD. The paper, titled “Estimates with errors and errors with estimates: Using the R ‘acs’ package for analysis of American Community Survey data,” is available through the SSRN or my faculty publications webpage.

Better yet, the session will also include a presentation by Michael Laviolette, Dennis Holt, and Kristin K. Snow of the State of New Hampshire Department of Health and Human Services on “Using the R Language and ‘acs’ Package to Compile and Update a Social Vulnerability Index for New Hampshire.” It’s great to see how planners are using and extending this package in all sorts of exciting new settings and applications.

Click these links to see the complete program or to register for the conference.

acs.R useRs: Share your success stories

Posted by Ezra Glenn on October 20, 2014
Census, Code, Free Software, Open-Source / Comments Off on acs.R useRs: Share your success stories

Have you been using the acs.R package to download and analyze Census data in your work? Do you have a story you’d be willing to share, to help us promote the package and show off all the cool ways people are using open source tools to make sense of data and help inform communities, policy-makers, and researchers? If so, please let us know: email your news or project descriptions to eglenn@mit.edu and we’ll post them here to inform and inspire our readers. Be sure to include good images, news coverage, quotations, or other materials to help tell the story — and feel free to include links, scripts, or examples as well.

Thanks!

PS: Don’t forget to subscribe to the acs.R mailing list to remain in touch with the growing acs.R user community.

acs.R version 1.2: Now, with 2012 data

Posted by Ezra Glenn on January 22, 2014
Census, Code, Self-promotion / No Comments

As some of you have noticed, the new five-year Census ACS data has just come out, and is now available via the Census API. To make sure you are able to fetch the freshest possible data to play with in R, I’ve updated the acs.R package to version 1.2, which now includes full support for the 2008–2012 ACS data.

The latest version is now available on the CRAN repository. If you’ve already installed the package in the past, you can easily update with the update.packages() command; if you’ve never installed it, you can just as easily install it for the first time, by simply typing install.packages(“acs”). In either case, be sure to load the library after installing by typing library(acs), and install (or re-install) an API key with api.key.install() — see the documentation and the latest version of the acs user guide for more info.

To get the latest data, just continue to use the acs.fetch() function as usual, but specify endyear=2012. (By default, endyear is set to 2011 if no year is explicitly passed to acs.fetch, and I didn’t want to change this for fear of breaking existing user scripts. In the future, we might to rethink this, so that it selects the most recent endyear by default. Thoughts?)

(Note: If you’re not sure which version you are using, you can always type packageVersion(“acs”) to find out.)

New choropleth package in R

Posted by Ezra Glenn on January 22, 2014
Census, Code, Free Software, Open-Source / No Comments

A while back I posted a recipe (based on some great examples on the Revolution Analytics blog) showing how to use the acs package in R to create choropleth maps. Now, through the magic of open-source software development — and the hard work of developer Ari Lamstein and the generosity of his employers — this process has gotten even easier: I call your attention to Ari’s new chorolethr package for R.

Ari is a Senior Software Engineer at Trulia, where he works on data science and visualization, primarily related to real estate and housing markets. As part of the company’s “Innovation Week” he developed the choropleth package, moving well beyond the sample scripts to create a powerful suite of mapping functions. With a single command, a user can now generate maps at the state, county, or zip code level, from any of the data available via the ACS.

http://tech.truliablog.com/files/2014/01/county-income.png

The package is not yet up on CRAN, but Ari promises that’s in the works; for now, you can learn more about it — including installation instructions using install_github() — on the Trulia Tech + Design blog. (I’m of course proud to note that the acs.R package lies at the foundation of these tools, doing the heavy-lifting of fetching and processing up-to-date data from the American Community Survey — but Ari’s work is already moving beyond these humble roots, allowing users to create choropleth maps of any data they can get their hands on….)

To learn more about the types of projects undertaken by Trulia staffers during Innovation week, see this short video. Congratulations — and thanks — to both Ari and Trulia for helping to drive innovation forward in R and other open source projects.

Using acs.R for a t-test

Posted by Ezra Glenn on November 06, 2013
Census / No Comments

I’ve been asked to provide a very quick example of using the acs.R package to conduct a t-test of significance when comparing ACS data from two different geographical areas—so here goes: a quick example.

Let’s look at the number of school-age children in different towns on Martha’s Vineyard. There are seven towns on this island, and luckily (like all New England towns) they are each represented as a different “county subdivision;” in this case, they together make up Dukes County. So we can get some quick age demographics for all seven of them in two quick commands:

> towns=geo.make(state="MA", county="Dukes", county.subdivision="*")
> towns.pop.sex=acs.fetch(geography=towns, table.number="B01001",
  col.name="pretty")
> # one more step just to shorten geography -- just town names
> geography(towns.pop.sex)[,1]=str_replace(
      geography(towns.pop.sex)[,1]," town.*","")

If you look at the column names using acs.colnames(towns.pop.sex), you will see that we are most interested in columns 4-6 (male children age 5-17) and 28-30 (female children, same ages). We also might need column 1 (the total population for the town), for the purpose of calculating percentages later on.

Continue reading…

Census API Down Due to Breakdown of Federal Government

Posted by Ezra Glenn on October 02, 2013
Census, Code / No Comments

Due to the shutdown of the federal government, it appears that many federal websites are down as well, including the Census API (see http://outage.census.gov/closed.txt). As a result, the acs.R package is currently unable to download data – sorry! If you to use acs.fetch or anything related that requires the API, you will probably get the following error:

> acs.fetch(geo=geo.make(state=25), table.number="B01003")
Error in file(file, "rt") : cannot open the connection

I assume that once this all gets sorted out, the site will come right back as before.

acs.R Webinar: now online

Posted by Ezra Glenn on September 04, 2013
Census / No Comments

For those who were unable to attend the recent acs.R webinar (or if you fell asleep halfway through), we’ve posted the complete video, the presentation slides, and the webinar demo script. Enjoy, and thanks again to Ray DiGiacomo and the Orange County R User for hosting this event.

acs.R Webinar: Aug 29

Posted by Ezra Glenn on August 07, 2013
Census / No Comments

On August 29, 2013 (1:00PM Eastern/10:00AM Pacific) I’ll be presenting a free webinar on using the acs package, hosted by the Orange County R User Group. This one-hour on-line event will provide a general introduction to the acs package, including a live demonstration and time for Q&A.

Details:

  • Speaker: Ezra Haber Glenn, Lecturer at the Massachusetts Institute of Technology; maintainer, acs.R package
  • Moderator: Ray DiGiacomo, President of The Orange County R User Group
  • Registration (free): https://www3.gotomeeting.com/register/730429166

Please note that in addition to attending from your laptop or desktop computer, you can also attend from a Wi-Fi connected iPhone, iPad, Android phone or Android tablet by installing the GoToMeeting App.

Using acs.R to create choropleth maps

Posted by Ezra Glenn on July 15, 2013
Census, Code, Maps / No Comments

Some time ago, FlowingData issued a challenge to create choropleth maps with open source tools, resulting in some nice little scripts using R’s ggplot2 and maps packages — all nicely covered on the Revolution Analytics blog. Some users have recently asked whether the acs package can be used to create similar maps, and the answer (of course) is yes. Here’s how.

For starters, to manage expectations, keep in mind that the map_data() function, which actually generates the geographic data you need to plot maps, does not currently provide boundary data for very many Census geographies — so sadly a lot of the new expanded support for various Census summary levels built into the acs package can’t be used. What you can do, however, is very easily plot state and county maps, which is what we’ll showcase below.

Secondly, the statistician in me feels compelled to point out one problem with using the acs package for these sorts of maps: choropleth maps are really fun to use to quickly show off the geographic distribution of some key statistic, but there is a price: in order to plot the data, we are really limited to a single number for each polygon. As a result, we can plot estimates from the ACS, but only if we are willing to ignore the margins of errors, and pretend that the data is less “fuzzy” than it really is. Given that the whole point of the acs package was to call attention to standard errors, and provide tools to work with them, it seems sort of counter-intuitive to use the package in this way — but given the ease of downloading ACS through the acs.fetch() function, it may still be a good (although slightly irresponsible) use of the package.

Given all that, I’ll step back down off my high horse, and show you how to make some maps, drawing heavily on the scripts provided by Hadley Wickham for using the ggplot2 package. For this example, let’s look at the percentage of people who take public transportation to work in the U.S., by county.

For starters, we’ll need to install and load the required packages:

> install.packages("acs")
> install.packages("ggplot2")
> install.packages("maps")
> library(acs)
> library(ggplot2)
> library(maps)

If you haven’t already obtained and installed an API key from the Census, you’ll need to do that as well — see ?api.key.install or check section 3.3 in the user guide.

Next, we use the map_data function to create some map boundary files to use later.

# load the boundary data for all counties
> county.df=map_data("county")
# rename fields for later merge
> names(county.df)[5:6]=c("state","county")
> state.df=map_data("state")

Turning to the acs package, we create a new geo.set consisting of all the tracts for all the state, and in a single call to acs.fetch download Census table B08301, which contains data on Mean of Transportation to Work. (If we didn’t know the table number we needed, we could use acs.lookup() to search for likely candidates, or even pass search strings directly to the acs.fetch() function.)

# wow! a single geo.set to hold all the counties...?
> us.county=geo.make(state="*", county="*")
# .. and a single command to fetch the data...!
> us.transport=acs.fetch(geography=us.county, 
     table.number="B08301", col.names="pretty")

The data we’ve fetched includes estimates of raw numbers of workers, not percentages, so we’ll need to do some division. Since we are interested in a proportion (and not a ratio), we to use the divide.acs function, not just “/”.1 For our dataset, the 10th column is the number of workers taking public transportation to work, and the 1st column is the total number of workers in the county. (See acs.colnames(us.transport) to verify this.) After we complete the division, we extract the estimates into a new data.frame, along with state and county names for each. (We need to do a little string manipulation to make these fields match with those from the map_data function above.)

> us.pub.trans=divide.acs(numerator=us.transport[,10], 
     denominator=us.transport[,1], method="proportion")
> pub.trans.est=data.frame(county=geography(us.pub.trans)[[1]], 
     percent.pub.trans=as.numeric(estimate(us.pub.trans)))
# this next step is all for Louisiana!
> pub.trans.est$county=gsub("Parish", "County", pub.trans.est$county)
# clean up county names and find the states
> pub.trans.est$state=tolower(gsub("^.*County, ", "", pub.trans.est$county))
> pub.trans.est$county=tolower(gsub(" County,.*", "", pub.trans.est$county))

Next, following Wickham’s script, we merge the boundaries with the data into a new data.frame (called choropleth) and reorder and recode it for out map levels.

> choropleth=merge(county.df, pub.trans.est, by=c("state","county"))
> choropleth=choropleth[order(choropleth$order), ]
> choropleth$pub.trans.rate.d=cut(choropleth$percent.pub.trans, 
     breaks=c(0,.01,.02,.03,.04,.05,.1,1), include.lowest=T)

And voila – a single call to ggplot and we have our map!

> ggplot(choropleth, aes(long, lat, group = group)) +
     geom_polygon(aes(fill = pub.trans.rate.d), colour = "white", size = 0.2) + 
     geom_polygon(data = state.df, colour = "white", fill = NA) +
     scale_fill_brewer(palette = "Purples")

./acs_scripts/choropleth_county_pub_trans.jpg

(For those who would like to see the entire script in all its efficient 14-line glory, I’ve pasted it below for easy cutting and pasting.)

install.packages("acs")
install.packages("ggplot2")
install.packages("maps")
library(acs)
library(ggplot2)
library(maps)
county.df=map_data("county")
names(county.df)[5:6]=c("state","county")
state.df=map_data("state")
us.county=geo.make(state="*", county="*")
us.transport=acs.fetch(geography=us.county, 
     table.number="B08301", col.names="pretty")
us.pub.trans=divide.acs(numerator=us.transport[,10], 
     denominator=us.transport[,1], method="proportion")
pub.trans.est=data.frame(county=geography(us.pub.trans)[[1]], 
     percent.pub.trans=as.numeric(estimate(us.pub.trans)))
pub.trans.est$county=gsub("Parish", "County", pub.trans.est$county)
pub.trans.est$state=tolower(gsub("^.*County, ", "", pub.trans.est$county))
pub.trans.est$county=tolower(gsub(" County,.*", "", pub.trans.est$county))
choropleth=merge(county.df, pub.trans.est, by=c("state","county"))
choropleth=choropleth[order(choropleth$order), ]
choropleth$pub.trans.rate.d=cut(choropleth$percent.pub.trans, 
     breaks=c(0,.01,.02,.03,.04,.05,.1,1), include.lowest=T)
ggplot(choropleth, aes(long, lat, group = group)) +
     geom_polygon(aes(fill = pub.trans.rate.d), colour = "white", size = 0.2) + 
     geom_polygon(data = state.df, colour = "white", fill = NA) +
     scale_fill_brewer(palette = "Purples")

wpid-choropleth\_county\_pub\_trans1.jpg

Footnotes:

1 Technically, since we are going to ignore the standard errors in our map, this could just be a standard division using “/”, but we might later want to look at the margins of error, etc. (For more on this issue, see ?divide.acs.)

Tags: , , , , ,

acs.R version 1.1: PUMAs and Zip Codes and MSAs, Oh My!

Posted by Ezra Glenn on July 14, 2013
Census, Code, Self-promotion / No Comments

Development continues on the acs package for R, with the latest update (version 1.1) now officially available on the CRAN repository. If you’ve already installed the package in the past, you can easily update with the update.packages() command; if you’ve never installed it, you can just as easily install it for the first time, by simply typing install.packages(“acs”). In either case, be sure to load the library after installing by typing library(acs), and install (or re-install) an API key with api.key.install() — see the documentation and the latest version of the acs user guide (which still references version 1.0).

Beyond improvements described in a previous post about version 1.0, the most significant change in the latest version is support for many more different combinations of census geography via the geo.make function. As described in the manual and on-line help, users can now specify options to create user-defined geographies composed of combinations of states, counties, county subdivisions, tracts, places, blockgroups (all available in the previous version), plus many more: public use microdata areas (PUMAs), metropolitan statistical areas (MSAs), combined statistical areas (CSAs), zip code tabulation areas, census regions and divisions, congressional district and state legislative districts (both upper and lower chambers), American Indian Areas, state school districts (of various types), New England County and Town Areas (NECTAs), and census urban areas. These geographies can be combined to create 25 different census summary levels, which can then even be bundled together to make even more complex geo.sets.

Once created and saved, these new user-defined geo.sets can be fed into the existing acs.fetch function to immediately download data from the ACS for these areas, combining them as desired in the process (and handling all those pesky estimates and margins of error in statistically-appropriate ways.)

We encourage you to update to the latest version and begin to explore the full power of the census data now available through the Census American Community Survey API. (And be sure to subscribe to the acs.R user group mailing list to be informed of future improvements.

Tags: , , , , ,