Posted by Ezra Glenn
on March 29, 2012
As noted elsewhere here on CityState, I’ve developed a package for working with data from the American Community Survey in the R statistical computing language. The most recent official version of the package is 0.8, which can be found on CRAN. Since the package is still in active development, I’ve decided to provide development snapshots here, for users who are looking to work with the latest code as I develop it.
I’m hoping that the next major release will be version 1.0, due out sometime this spring. As I work towards that, here is version 0.8.1, which can be considered the first “snapshot” headed toward this release.
To install, simply download, start R, and type:
read.acs can now accept either a csv or a zip file downloaded directly from the FactFinder site, and it does a much better job (a) guessing how many rows to skip, (b) figuring out how to generate intelligent variable names for the columns, and (c) dealing with arcane non-numeric symbols used by FactFinder for some estimates and margins of error.
plot now includes a
true.min= option, which allows you to
specify whether you want to allow error bars to span into negative values (
true.min=T, the default), or to bound them at zero (
true.min=F – or some other numeric value). This seemed necessary because it looks silly to say “The number of children who speak Spanish in this tract is 15, plus or minus 80…” At the same time, if the variable turns out to be something like the difference in the income of Males and the income if Females in the geography, a negative value may make a lot of sense, and should be plotted as such.
Posted by Ezra Glenn
on March 18, 2012
I’ve just released a new version of my
acs package for working with the U.S. Census American Community Survey data in R, available on CRAN. The current version 0.8 includes all the original version 0.6 code, plus a whole lot more features and fixes. Some highlights:
- An improved
read.acs function for importing data downloaded from the Census American FactFinder site.
cbind functions to help create larger acs objects from smaller ones.
- A new
sum method to aggregate rows or columns of ACS data, dealing correctly with both estimates and standard errors.
- A new
apply method to allow users to apply virtually any function to each row or column of an acs data object.
- A snazzy new
plot method, capable of plotting both density plots (for estimates of a single geography and variable) and multiple estimates with errors bars (for estimates of the same variable over multiple geographies, or vice versa). See sample plots below.
- New functions to deal with adjusting the nominal values of currency from different years for the purpose of comparing between one survey and another. (See
currency.year in the documentation.)
- A new tract-level dataset from the ACS for Lawrence, MA, with dollar value currency estimates (useful to show off the aforementioned new currency conversion functions).
- A new
prompt method to serve as a helper function when changing geographic rownames or variable column names.
- Improved documentation on the
acs class and all of these various new functions and methods, with examples.
With this package, once you’ve found and downloaded your data from FactFinder, you can read it into
R with a single command, aggregate multiple tracts into a neighborhood with another, generate a table of estimates and confidence intervals for your neighborhood with a third command, and a produce a print-ready plot of your data (complete with error bars for the margins of error) with a fourth:
my.neighborhood=apply(my.data, FUN="sum", MARGIN=1, agg.term="My.Neighborhood")
plot(my.neighborhood, col="blue", err.col="violet", pch=16)
Already this package has come a long way, in large part thanks to the input of R users, so please check it out and let me know what you think — and how I can make it better.