Code

Now on CRAN: acs.R version 1.0

Posted by Ezra Glenn on June 25, 2013
Census, Code, Self-promotion / No Comments

We are pleased to announce that the acs.R package is now ready for prime-time: version 1.0 was officially released last week and is now available on CRAN.1 This version, developed in partnership with the Puget Sound Regional Council, includes all the enhancements described in this post, plus additional tweaks, and lots of documentation.

Just to recap, as of version 1.0:

  • The package is now capable of downloading data directly from the new Census American Community Survey API and importing into R (with proper statistical treatment of estimates and error, variable and geographic relabeling, and more), all through a single “acs.fetch()” function;
  • The package includes a new “geo.make()” function to allow users to create their own custom geographies for organize and download data; and
  • The package provides two special “lookup” tools to help filter through all the existing Census geographies (with the “geo.lookup()” function) and tables (with the “acs.lookup()” function) to find exactly what they want. The acs.lookup function return new “acs.lookup” objects which can be saved, manipulated, and passed to acs.fetch() for downloading data.

I’ve also updated the user guide (version 1.0), which includes step-by-step instructions for working with the package, plus an extended example in the appendix on using blockgroup-level ACS data to create your own neighborhood geographies. (You can also view the complete package manual from the CRAN site.)

Finally, if you’re interested in staying in touch with the ongoing development of the package, be sure to sign up for the acs.R user group mailing list: to register, visit http://mailman.mit.edu/mailman/listinfo/acs-r.

Footnotes:

1 Note: the latest version of this package is actually 1.01, which includes a few additional big-fixes.

Tags: , , , , ,

acs-r Mailing List: keep in the loop

Posted by Ezra Glenn on April 24, 2013
Census, Code, Self-promotion / No Comments

We’re pleased to announce the creation of a new mailing list for the acs.R package. The “acs” package allows users to download, manipulate, analyze, and visualize data from the American Community Survey in R; the “acs-r” e-mail list allows members to keep in touch and share information about the package, including updates from the development team concerning improvements, user questions and help requests, worked examples, and more. To register, visit http://mailman.mit.edu/mailman/listinfo/acs-r.

Clash of Clans Online Hack and Cheat

Tags: , , , , ,

acs.R: a worked example using blockgroup-level data

Posted by Ezra Glenn on March 11, 2013
Census, Code / 2 Comments

A very nice user wrote the following in an email to me about the latest version of the acs.R package:

 
> Thanks for providing such a wonderful package in R. I'm having
> difficulty defining a geo at the block group level. Would you mind
> sharing an example with me?

I responded via email, but thought that my answer — which took the form of a short worked-example — might be helpful to others, so I am posting it here as well. Here’s what I said:

To showcase how the package can create new census geographies based on stuff like blockgroups, let’s look in my home state of Massachusetts, in Middlesex County. If I wanted to get info on all the block groups for tract 387201,1 I could create a new geo like this:

> my.tract=geo.make(state="MA", county="Middlesex", 
  tract=387201, block.group="*", check=T)
Testing geography item 1: Tract 387201, Blockgroup *, 
  Middlesex County, Massachusetts .... OK.
> 

(This might be a useful first step, especially if I didn’t know how many block groups there were in the tract, or what they were called. Also, note that check=T is not required, but can often help ensure you are dealing with valid geos.)

If I then wanted to get very basic info on these block groups – say, table number B01003 (Total Population), I could type:

> total.pop=acs.fetch(geo=my.tract, table.number="B01003")
> total.pop
ACS DATA: 
 2007 -- 2011 ;
  Estimates w/90% confidence intervals;
  for different intervals, see confint()
              B01003_001  
Block Group 1 2681 +/- 319
Block Group 2 952 +/- 213 
Block Group 3 1010 +/- 156
Block Group 4 938 +/- 214 
> 

Here we can see that the block.group=”*” has yielded the actual four block groups for the tract.

Now, if instead of wanting all of them, we only wanted the first two, we could just type:

> my.bgs=geo.make(state="MA", county="Middlesex", 
  tract=387201, block.group=1:2, check=T)
Testing geography item 1: Tract 387201, Blockgroup 1, 
  Middlesex County, Massachusetts .... OK.
Testing geography item 2: Tract 387201, Blockgroup 2, 
  Middlesex County, Massachusetts .... OK.
> 

And then:

> bg.total.pop=acs.fetch(geo=my.bgs, table.number="B01003")
> bg.total.pop
ACS DATA: 
 2007 -- 2011 ;
  Estimates w/90% confidence intervals;
  for different intervals, see confint()
              B01003_001  
Block Group 1 2681 +/- 319
Block Group 2 952 +/- 213 
> 

Now, if we wanted to add in some blockgroups from tract 387100 (a.k.a. “tract 3871” — but remember: we need those trailing zeroes) – say, blockgroups 2 and 3 – we could enter:

> my.bgs=my.bgs+geo.make(state="MA", county="Middlesex", 
  tract=387100, block.group=2:3, check=T)
Testing geography item 1: Tract 387100, Blockgroup 2, 
  Middlesex County, Massachusetts .... OK.
Testing geography item 2: Tract 387100, Blockgroup 3, 
  Middlesex County, Massachusetts .... OK.

And then:

> new.total.pop=acs.fetch(geo=my.bgs, table.number="B01003")
> new.total.pop
ACS DATA: 
 2007 -- 2011 ;
  Estimates w/90% confidence intervals;
  for different intervals, see confint()
              B01003_001  
Block Group 1 2681 +/- 319
Block Group 2 952 +/- 213 
Block Group 2 827 +/- 171 
Block Group 3 1821 +/- 236
> 

Note that the short rownames can be confusing – as in this example — but if you type:

> geography(new.total.pop)
           NAME state county  tract blockgroup
1 Block Group 1    25     17 387201          1
2 Block Group 2    25     17 387201          2
3 Block Group 2    25     17 387100          2
4 Block Group 3    25     17 387100          3
> 

you can see that the two entries for “Block Group 2” are actually in different tracts. (Also note: you can combine block groups and other levels of geography, all in a single geo objects…)

And now, to show off the coolest part! Let’s say I don’t just want to get data on the four blockgroups, but I want to combine them into a single new geographic entity. Before downloading, I could simply say:

> combine(my.bgs)=T
> combine.term(my.bgs)="Select Blockgroups"
> new.total.pop=acs.fetch(geo=my.bgs, table.number="B01003")
> new.total.pop
ACS DATA: 
 2007 -- 2011 ;
  Estimates w/90% confidence intervals;
  for different intervals, see confint()
                   B01003_001               
Select Blockgroups 6281 +/- 481.733328720362
>

And see – voila! – it sums the estimates and deals with the margins of error, so you don’t need to get your hands dirty with square roots and standard errors and all that messy stuff.

You can even create interesting nested geo.sets, where some of the lower levels are combined, like this:

> combine.term(my.bgs)="Select Blockgroups, 
  Tracts 387100 and 387201"
> more.bgs=c(my.bgs, geo.make(state="MA", 
  county="Middlesex", tract=370300, block.group=1:2, check=T), 
  geo.make(state="MA", county="Middlesex", tract=370400, 
  block.group=1:3, combine=T, combine.term="Select Blockgroups, 
  Tract 3703", check=T)) 
Testing geography item 1: Tract 370300, Blockgroup 1, 
  Middlesex County, Massachusetts .... OK.
Testing geography item 2: Tract 370300, Blockgroup 2, 
  Middlesex County, Massachusetts .... OK.
Testing geography item 1: Tract 370400, Blockgroup 1, 
  Middlesex County, Massachusetts .... OK.
Testing geography item 2: Tract 370400, Blockgroup 2, 
  Middlesex County, Massachusetts .... OK.
Testing geography item 3: Tract 370400, Blockgroup 3, 
 Middlesex County, Massachusetts .... OK.
> more.total.pop=acs.fetch(geo=more.bgs, table.number="B01003")
> more.total.pop
ACS DATA: 
 2007 -- 2011 ;
  Estimates w/90% confidence intervals;
  for different intervals, see confint()
                                             B01003_001               
Select Blockgroups, Tracts 387100 and 387201 6281 +/- 481.733328720362
Block Group 1                                315 +/- 132              
Block Group 2                                1460 +/- 358             
Select Blockgroups, Tract 3703               2594 +/- 487.719181496894
> 

In closing: I hope this helps, and be sure to contact me if you have other questions/problems about using the package.

Footnotes:

1 Note that tracts are often referred to in a strange “four-digit+decimal extension” shorthand, so “tract 387201” may be also known as “tract 3872.01”. When working with this package, be careful and always use six-digit tract numbers in this package without the decimal point. If the tract number seems to only be four digits long, add two extra “trailing” zeroes at the end.

Tags: , , , ,

Major improvements to acs.R: sneak peak at version 1.0

Posted by Ezra Glenn on February 05, 2013
Census, Code / 4 Comments

It’s been a while since I last updated the acs.R package, but as noted here, I’ll be using CityState to provide updates and test-versions of the package prior to uploading to CRAN. I’m happy to report that we now have a near-final package of version 1.0.

The most significant improvements to the package (beyond those mentioned previously) are the following;

  • The package is now capable of downloading data directly from the new Census American Community Survey API and importing into R (with proper statistical treatment of estimates and error, variable and geographic relabeling, and more), all through a single “acs.fetch()” function;
  • The package includes a new “geo.make()” function to allow users to create their own custom geographies for organize and download data; and
  • The package provides two special “lookup” tools to help filter through all the existing Census geographies (with the “geo.lookup()” function) and tables (with the “acs.lookup()” function) to find exactly what they want. These functions return new R “lookup” objects which can be saved, manipulated, and passed to acs.fetch() for downloading data.

I want to thank the very kind folks at the Puget Sound Regional Council, who have been supporting the development of this package (in exchange for some special attention to scripts and functions they really want to include for themselves and their member communities). They continue to provide excellent help and advice “from the trenches” as we refine the package.

If you’re interested in trying out the new version, you can download it below, along with a brief set of “Introductory Notes” written for the team at PSRC. (Users may also want to check out the manual for the previous version of the package and this article from 2011 on the package.)

Tags: , , , ,

acs Package at Upcoming Conference: UseR! 2012

Posted by Ezra Glenn on April 09, 2012
Census, Code, Self-promotion / No Comments

I’m happy to report that I’ll be giving a paper on my acs package at the 8th annual useR! conference, Coming June 12-15th to Vanderbilt University in Nashville, TN. The paper is titled “Estimates with Errors and Errors with Estimates: Using the R acs Package for Analysis of American Community Survey Data.” Here’s the abstract:


"Estimates with Errors and Errors with Estimates: Using the R acs
Package for Analysis of American Community Survey Data"
Ezra Haber Glenn

Over the past decade, the U.S. Census Bureau has implemented the
American Community Survey (ACS) as a replacement for its traditional
decennial ``long-form'' survey.  Last year—for the first time
ever—ACS data was made available at the census tract and block group
level for the entire nation, representing geographies small enough to
be useful to local planners; in the future these estimates will be
updated on a yearly basis, providing much more current data than was
ever available in the past.  Although the ACS represents a bold
strategy with great promise for government planners, policy-makers,
and other advocates working at the neighborhood scale, it will require
them to become comfortable with statistical techniques and concerns
that they have traditionally been able to avoid.

To help with this challenge the author has been working with
local-level planners to determine the most common problems associated
with using ACS data, and has implemented these functions as a package
in R.  The package—currently hosted on CRAN in version 0.8—defines
a new ``acs'' class object (containing estimates, standard errors, and
metadata for tables from the ACS), with methods to deal appropriately
with common tasks (e.g., combining subgroups or geographies,
mathematical operations on estimates, tests of significance, plots of
confidence intervals, etc.).

This paper will present both the use and the internal structure of the
package, with discussion of additional lines of development.

Hope to see you all there!

Tags: , , , , ,

Constantly Improving: acs development versions

Posted by Ezra Glenn on March 29, 2012
Census, Code / No Comments

As noted elsewhere here on CityState, I’ve developed a package for working with data from the American Community Survey in the R statistical computing language. The most recent official version of the package is 0.8, which can be found on CRAN. Since the package is still in active development, I’ve decided to provide development snapshots here, for users who are looking to work with the latest code as I develop it.

I’m hoping that the next major release will be version 1.0, due out sometime this spring. As I work towards that, here is version 0.8.1, which can be considered the first “snapshot” headed toward this release.

acs_0.8.1.tar.gz

To install, simply download, start R, and type:

 

> install.packages("path/to/file//acs_0.8.1.tar.gz") > library(acs)

Updates include:

  • read.acs can now accept either a csv or a zip file downloaded directly from the FactFinder site, and it does a much better job (a) guessing how many rows to skip, (b) figuring out how to generate intelligent variable names for the columns, and (c) dealing with arcane non-numeric symbols used by FactFinder for some estimates and margins of error.
  • plot now includes a true.min= option, which allows you to

specify whether you want to allow error bars to span into negative values (true.min=T, the default), or to bound them at zero (true.min=F – or some other numeric value). This seemed necessary because it looks silly to say “The number of children who speak Spanish in this tract is 15, plus or minus 80…” At the same time, if the variable turns out to be something like the difference in the income of Males and the income if Females in the geography, a negative value may make a lot of sense, and should be plotted as such.

Tags: , , , ,

acs Package Updated: version 0.8 now on CRAN

Posted by Ezra Glenn on March 18, 2012
Census, Code / 2 Comments

I’ve just released a new version of my acs package for working with the U.S. Census American Community Survey data in R, available on CRAN. The current version 0.8 includes all the original version 0.6 code, plus a whole lot more features and fixes. Some highlights:

  • An improved read.acs function for importing data downloaded from the Census American FactFinder site.
  • rbind and cbind functions to help create larger acs objects from smaller ones.
  • A new sum method to aggregate rows or columns of ACS data, dealing correctly with both estimates and standard errors.
  • A new apply method to allow users to apply virtually any function to each row or column of an acs data object.
  • A snazzy new plot method, capable of plotting both density plots (for estimates of a single geography and variable) and multiple estimates with errors bars (for estimates of the same variable over multiple geographies, or vice versa). See sample plots below.

 

  • New functions to deal with adjusting the nominal values of currency from different years for the purpose of comparing between one survey and another. (See currency.convert and currency.year in the documentation.)
  • A new tract-level dataset from the ACS for Lawrence, MA, with dollar value currency estimates (useful to show off the aforementioned new currency conversion functions).
  • A new prompt method to serve as a helper function when changing geographic rownames or variable column names.
  • Improved documentation on the acs class and all of these various new functions and methods, with examples.

With this package, once you’ve found and downloaded your data from FactFinder, you can read it into R with a single command, aggregate multiple tracts into a neighborhood with another, generate a table of estimates and confidence intervals for your neighborhood with a third command, and a produce a print-ready plot of your data (complete with error bars for the margins of error) with a fourth:

my.data=read.acs("some_data.csv")
my.neighborhood=apply(my.data, FUN="sum", MARGIN=1, agg.term="My.Neighborhood") 
confint(my.neighborhood, conf.level=.95) 
plot(my.neighborhood, col="blue", err.col="violet", pch=16)

Already this package has come a long way, in large part thanks to the input of R users, so please check it out and let me know what you think — and how I can make it better.

Tags: , , , ,

org2blog

Posted by Ezra Glenn on February 18, 2012
Code / 1 Comment

For those of you who’ve noticed that I’ve started being a more active blogger over the last few weeks, there’s a good explanation: I’ve discovered org2blog.

Given that I try to live as much of my life as possible in emacs (or at least as much of my virtual life as possible), org2blog is a godsend. Using the emacs’ excellent org-mode has already revolutionized my writing, coding, and the way I organize my time and my projects, and now – through this intuitive and clever extension — it is helping organize my blog activity as well.

Others (for example, here and here have already written extensively on the how and the why of org2blog: basically, you install org-mode (already built-in to most modern emacsen), load a few more special .el files, and with a little customization you’re good to go.

The real magic, however, comes in the use of org-mode to bring order to the chaos of your thoughts, so that blog posts are planned, scheduled, and reflective – and the resulting blog is actually organized and structured (as opposed to the random “shopping lists of my thoughts” model).

Continue reading…

Tags: , , , ,