census

acs version 2.0: now on CRAN

Posted by Ezra Glenn on March 14, 2016
Code / Comments Off on acs version 2.0: now on CRAN

After far too long, we are pleased to release version 2.0 of the acs package.

The biggest improvement is full support for all ACS, SF1, and SF3 data currently available via the Census API, including ACS data from 2005-2014 and Decennial data from 1990, 2000, and 2010. (See below for more info.)

1 Downloading and installing

To install the updated version, simply fire up an R session and type:

> install.packages("acs", clean=T)

2 Learn more

To learn more about the package, see the following:

And be sure to join the acs.R User Group Mailing List.

3 Notes and updates

A few notes about this new package:

  • API Keys: by default, when R updates a package, it overwrites the old package files. Unfortunately, that is where archived api.keys get saved by api.key.install(). As part of the version 2.0 package installation, “configure” and “cleanup” scripts can be run which try to migrate the key to a new location. If this fails, the install script will suggest that users run api.key.migrate() after installation, which might resolve the issue. At worst, if both methods fail, a user can simply re-run api.key.install() with the original key and be good to go.
  • endyear now required: under the old package, acs.fetch and acs.lookup would default to endyear=2011 when no endyear was provided. This seemed smart at the time – 2011 was the most recent data available – but it is becoming increasingly absurd. One solution would have been to change the default to be whatever data is most recent, but that would have the unintended result of making the same script run differently from one year to the next: bad mojo. So the new preferred “version 2.0 solution” is to require users to explicitly indicate the endyear that they want to fetch each time. Note that this may require some changes to existing scripts.
  • ACS Data Updates: the package now provides on-board support for all endyears and spans currently available through the API, including:
    • American Community Survey 5-Year Data (2005-2009 through 2010-2014)
    • American Community Survey 3 Year Data (2013, 2012)
    • American Community Survey 1 Year Data (2014, 2013, 2012, 2011)

    See http://www.census.gov/data/developers/data-sets.html for more info, including guidance about which geographies are provided for each dataset.

  • Decennial Census Data: for the first time ever, the package now also includes the ability to download Decennial Data from the SF1 and SF3, using the same acs.fetch() function used for ACS data.
    • SF1/Short-Form (1990, 2000, 2010)
    • SF3/Long-Form (1990, 2000)1

    When fetched via acs.fetch(), this data is downloaded and converted to acs-class objects. (Note: standard errors for Decennial data will always be zero, which is technically not correct for SF3 survey data, but no margins of error are reported by the API.) See http://www.census.gov/data/developers/data-sets/decennial-census-data.html for more info.

    Also note that census support for the 1990 data is a bit inconsistent – the variable lookup tables were not in the same format as others, and far less descriptive information has been provided about table and variable names. This can make it tricky to find and fetch data, but if you know what you want, you can probably find it; looking in the files in package’s extdata directory might help give you a sense of what the variable codes and table numbers look like.

  • Other improvements/updates/changes:
    • CPI tables: the CPI tables used for currency.year() and currency.convert() have been updated to include data up through 2015.
    • acs.fetching with saved acs.lookup results: the results of acs.lookup can still be saved and passed to acs.fetch via the “variable=” option,2 with a slight change: under v. 1.2, the passed acs.lookup results would overrule any explicit endyear or span; with v 2.0, the opposite is true (the endyear and span in the acs.lookup results are ignored by acs.fetch). This may seem insignificant, but it will eventually be important, when users want to fetch data from years that are more recent than the version of the package, and need to use old lookup results to do so.
    • divide.acs fixes: the package includes a more robust divide.acs() function, which handles zero denominators better and takes full advantage of the potential for reduced standard errors when dividing proportions.
    • acs.tables.install: to obtain variable codes and other metadata needed to access the Census API, both acs.fetch and acs.lookup must consult various XML lookup files, which are provided by the Census with each data release. To keep the size of the acs package within CRAN guidelines and to ensure tables will always be up-to-date, as of version 2.0 these files are accessed online at run-time for each query, rather than being bundled with each package release. As an alternative to these queries, users may use acs.tables.install to download and archive all current tables (approximately 10MB, as of version 2.0 release), which are saved by the package and consulted locally when present.

      Use of this function is completely optional and the package should work fine without it (assuming the computer is online and is able to access the lookup tables), but running it once may result in faster searches and quicker downloads for all subsequent sessions. (The results are saved and archived, so once a user has run the function, it is unnecessary to run again, unless the acs package is re-installed or updated.)

Other than these points, everything should run the same as the acs package you’ve come to know and love, and all your old scripts and data objects should still be fine. (Again, with the one big exception that you’ll need to add “endyear=XXXX” to any calls to acs.fetch and acs.lookup.)

Special thanks to package beta testers (Ari, Arin, Bethany, Emma,John, and Michael) and the entire acs-r community, as well as to Uwe and Kurt at CRAN for their infinite patience and continuing care and stewardship of the system.

Footnotes:

1

SF3 was discontinued after 2000 and replaced with the ACS.

2

did you even know this was possible…???

Tags: , , , ,

Using acs.R to create choropleth maps

Posted by Ezra Glenn on July 15, 2013
Census, Code, Maps / No Comments

Some time ago, FlowingData issued a challenge to create choropleth maps with open source tools, resulting in some nice little scripts using R’s ggplot2 and maps packages — all nicely covered on the Revolution Analytics blog. Some users have recently asked whether the acs package can be used to create similar maps, and the answer (of course) is yes. Here’s how.

For starters, to manage expectations, keep in mind that the map_data() function, which actually generates the geographic data you need to plot maps, does not currently provide boundary data for very many Census geographies — so sadly a lot of the new expanded support for various Census summary levels built into the acs package can’t be used. What you can do, however, is very easily plot state and county maps, which is what we’ll showcase below.

Secondly, the statistician in me feels compelled to point out one problem with using the acs package for these sorts of maps: choropleth maps are really fun to use to quickly show off the geographic distribution of some key statistic, but there is a price: in order to plot the data, we are really limited to a single number for each polygon. As a result, we can plot estimates from the ACS, but only if we are willing to ignore the margins of errors, and pretend that the data is less “fuzzy” than it really is. Given that the whole point of the acs package was to call attention to standard errors, and provide tools to work with them, it seems sort of counter-intuitive to use the package in this way — but given the ease of downloading ACS through the acs.fetch() function, it may still be a good (although slightly irresponsible) use of the package.

Given all that, I’ll step back down off my high horse, and show you how to make some maps, drawing heavily on the scripts provided by Hadley Wickham for using the ggplot2 package. For this example, let’s look at the percentage of people who take public transportation to work in the U.S., by county.

For starters, we’ll need to install and load the required packages:

> install.packages("acs")
> install.packages("ggplot2")
> install.packages("maps")
> library(acs)
> library(ggplot2)
> library(maps)

If you haven’t already obtained and installed an API key from the Census, you’ll need to do that as well — see ?api.key.install or check section 3.3 in the user guide.

Next, we use the map_data function to create some map boundary files to use later.

# load the boundary data for all counties
> county.df=map_data("county")
# rename fields for later merge
> names(county.df)[5:6]=c("state","county")
> state.df=map_data("state")

Turning to the acs package, we create a new geo.set consisting of all the tracts for all the state, and in a single call to acs.fetch download Census table B08301, which contains data on Mean of Transportation to Work. (If we didn’t know the table number we needed, we could use acs.lookup() to search for likely candidates, or even pass search strings directly to the acs.fetch() function.)

# wow! a single geo.set to hold all the counties...?
> us.county=geo.make(state="*", county="*")
# .. and a single command to fetch the data...!
> us.transport=acs.fetch(geography=us.county, 
     table.number="B08301", col.names="pretty")

The data we’ve fetched includes estimates of raw numbers of workers, not percentages, so we’ll need to do some division. Since we are interested in a proportion (and not a ratio), we to use the divide.acs function, not just “/”.1 For our dataset, the 10th column is the number of workers taking public transportation to work, and the 1st column is the total number of workers in the county. (See acs.colnames(us.transport) to verify this.) After we complete the division, we extract the estimates into a new data.frame, along with state and county names for each. (We need to do a little string manipulation to make these fields match with those from the map_data function above.)

> us.pub.trans=divide.acs(numerator=us.transport[,10], 
     denominator=us.transport[,1], method="proportion")
> pub.trans.est=data.frame(county=geography(us.pub.trans)[[1]], 
     percent.pub.trans=as.numeric(estimate(us.pub.trans)))
# this next step is all for Louisiana!
> pub.trans.est$county=gsub("Parish", "County", pub.trans.est$county)
# clean up county names and find the states
> pub.trans.est$state=tolower(gsub("^.*County, ", "", pub.trans.est$county))
> pub.trans.est$county=tolower(gsub(" County,.*", "", pub.trans.est$county))

Next, following Wickham’s script, we merge the boundaries with the data into a new data.frame (called choropleth) and reorder and recode it for out map levels.

> choropleth=merge(county.df, pub.trans.est, by=c("state","county"))
> choropleth=choropleth[order(choropleth$order), ]
> choropleth$pub.trans.rate.d=cut(choropleth$percent.pub.trans, 
     breaks=c(0,.01,.02,.03,.04,.05,.1,1), include.lowest=T)

And voila – a single call to ggplot and we have our map!

> ggplot(choropleth, aes(long, lat, group = group)) +
     geom_polygon(aes(fill = pub.trans.rate.d), colour = "white", size = 0.2) + 
     geom_polygon(data = state.df, colour = "white", fill = NA) +
     scale_fill_brewer(palette = "Purples")

./acs_scripts/choropleth_county_pub_trans.jpg

(For those who would like to see the entire script in all its efficient 14-line glory, I’ve pasted it below for easy cutting and pasting.)

install.packages("acs")
install.packages("ggplot2")
install.packages("maps")
library(acs)
library(ggplot2)
library(maps)
county.df=map_data("county")
names(county.df)[5:6]=c("state","county")
state.df=map_data("state")
us.county=geo.make(state="*", county="*")
us.transport=acs.fetch(geography=us.county, 
     table.number="B08301", col.names="pretty")
us.pub.trans=divide.acs(numerator=us.transport[,10], 
     denominator=us.transport[,1], method="proportion")
pub.trans.est=data.frame(county=geography(us.pub.trans)[[1]], 
     percent.pub.trans=as.numeric(estimate(us.pub.trans)))
pub.trans.est$county=gsub("Parish", "County", pub.trans.est$county)
pub.trans.est$state=tolower(gsub("^.*County, ", "", pub.trans.est$county))
pub.trans.est$county=tolower(gsub(" County,.*", "", pub.trans.est$county))
choropleth=merge(county.df, pub.trans.est, by=c("state","county"))
choropleth=choropleth[order(choropleth$order), ]
choropleth$pub.trans.rate.d=cut(choropleth$percent.pub.trans, 
     breaks=c(0,.01,.02,.03,.04,.05,.1,1), include.lowest=T)
ggplot(choropleth, aes(long, lat, group = group)) +
     geom_polygon(aes(fill = pub.trans.rate.d), colour = "white", size = 0.2) + 
     geom_polygon(data = state.df, colour = "white", fill = NA) +
     scale_fill_brewer(palette = "Purples")

wpid-choropleth\_county\_pub\_trans1.jpg

Footnotes:

1 Technically, since we are going to ignore the standard errors in our map, this could just be a standard division using “/”, but we might later want to look at the margins of error, etc. (For more on this issue, see ?divide.acs.)

Tags: , , , , ,

acs.R version 1.1: PUMAs and Zip Codes and MSAs, Oh My!

Posted by Ezra Glenn on July 14, 2013
Census, Code, Self-promotion / No Comments

Development continues on the acs package for R, with the latest update (version 1.1) now officially available on the CRAN repository. If you’ve already installed the package in the past, you can easily update with the update.packages() command; if you’ve never installed it, you can just as easily install it for the first time, by simply typing install.packages(“acs”). In either case, be sure to load the library after installing by typing library(acs), and install (or re-install) an API key with api.key.install() — see the documentation and the latest version of the acs user guide (which still references version 1.0).

Beyond improvements described in a previous post about version 1.0, the most significant change in the latest version is support for many more different combinations of census geography via the geo.make function. As described in the manual and on-line help, users can now specify options to create user-defined geographies composed of combinations of states, counties, county subdivisions, tracts, places, blockgroups (all available in the previous version), plus many more: public use microdata areas (PUMAs), metropolitan statistical areas (MSAs), combined statistical areas (CSAs), zip code tabulation areas, census regions and divisions, congressional district and state legislative districts (both upper and lower chambers), American Indian Areas, state school districts (of various types), New England County and Town Areas (NECTAs), and census urban areas. These geographies can be combined to create 25 different census summary levels, which can then even be bundled together to make even more complex geo.sets.

Once created and saved, these new user-defined geo.sets can be fed into the existing acs.fetch function to immediately download data from the ACS for these areas, combining them as desired in the process (and handling all those pesky estimates and margins of error in statistically-appropriate ways.)

We encourage you to update to the latest version and begin to explore the full power of the census data now available through the Census American Community Survey API. (And be sure to subscribe to the acs.R user group mailing list to be informed of future improvements.

Tags: , , , , ,

acs.R example: downloading all the tracts in a county or state

Posted by Ezra Glenn on July 03, 2013
Census, Code / No Comments

An acs.R user asks:

 
> How do I use acs to download all the census tracts? is there
> some handy command to do that?

Here’s some help:

All the tracts in a single county

You can’t automatically download all the tracts for the whole country (or even for an entire state) in a single step (but see below for ways to do this). If you just need all the tracts in a single county, it’s really simple — just use the “*” wildcard for the tract number when creating your geo.set.

The example below creates a geo.set for all the tracts in Middlesex County, Massachusetts, and then downloads data from ACS table B01003 on Total Population for them.

> my.tracts=geo.make(state="MA", county="Middlesex", tract="*") 
> acs.fetch(geography=my.tracts, table.number="B01003")

All the tracts in a state

If you happen to have a vector list of the names (or FIPS codes) of all the counties in a given state (or the ones you want), you could do something like this to get all the tracts in each of them:

> all.tracts=geo.make(state="MA", county=list.of.counties, 
  tract="*")
> acs.fetch(geography=all.tracts, table.number="B01003")

As an added bonus, if you don’t happen to have a list of counties, but want to use the package to get one, you could do something like this:

> mass=acs.fetch(geography=geo.make(state=25, county="*"), 
  table.number="B01003")

#  mass is now a new acs object with data for each county in
#  Massachusetts.  The "geography" function returns a dataframe of the
#  geographic metadata, which includes FIPS codes as the third
#  column.  So you can use it like this:

> all.tracts=geo.make(state="MA", 
  county=as.numeric(geography(mass)[[3]]), 
  tract="*", check=T)
> acs.fetch(geography=all.tracts, table.number="B01003")

All the tracts in the entire country

In theory, you could even use this to get all the tracts from all the 3,225 counties in the country:

> all.counties=acs.fetch(geography=geo.make(state="*", county="*"),
  table.number="B01003")
> all.tracts=geo.make(state=as.numeric(geography(all.counties)[[2]]),,
  county=as.numeric(geography(all.counties)[[3]]), tract="*", check=T)

Unfortunately (or perhaps fortunately), this is just too much for R to download without changing some of the internal variables that limit this sort of thing — if you try, R will complain with “Error: evaluation nested too deeply: infinite recursion…” To prove to yourself that it works, you could limit the number of counties to just the first 250, and try that — it will get you from Autauga County, Alabama to Bent County, Colorado.

> some.counties=all.counties[1:250]
> some.tracts=geo.make(state=as.numeric(geography(some.counties)[[2]]), 
  county=as.numeric(geography(some.counties)[[3]]), tract="*", check=T)
> lots.of.data=acs.fetch(geography=some.tracts, table.number="B01003")

This is really a lot of data — on my machine, this took about 18 seconds, resulting in a new acs object containing population data on 11,872 different tracts. I haven’t checked to see what the upper limits are, but I imagine it wouldn’t take much to figure out a way to get tract-level data from all 3,225 counties. (But remember: with great power comes great responsibility — don’t be too rough on downloading stuff from the Census, even if it is free and easy.)

Using the built-in FIPS data

An alternative approach to these last two examples would be to use the FIPS datasets that we’ve built-in to the acs.R package. For example, the “fips.county” dataset includes the names of each county, by state. Feed this (or part of this) to your geo.make command and you can do all sorts of neat things.

> head(fips.county)
  State State.ANSI County.ANSI    County.Name ANSI.Cl
1    AL          1           1 Autauga County      H1
2    AL          1           3 Baldwin County      H1
3    AL          1           5 Barbour County      H1
4    AL          1           7    Bibb County      H1
5    AL          1           9  Blount County      H1
6    AL          1          11 Bullock County      H1
> 

So instead of the last block above, you could do something like this:

> random.counties=sample(x=3225,size=20, replace=F)
> some.tracts=geo.make(state=fips.county[random.counties,1], 
  county=fips.county[random.counties,3], tract="*", check=T)
Testing geography item 1: Tract *, Ponce Municipio, Puerto Rico .... OK.
Testing geography item 2: Tract *, Alleghany County, North Carolina .... OK.
Testing geography item 3: Tract *, Wayne County, Pennsylvania .... OK.
Testing geography item 4: Tract *, Comerio Municipio, Puerto Rico .... OK.
Testing geography item 5: Tract *, Lafayette County, Wisconsin .... OK.
Testing geography item 6: Tract *, Hartford County, Connecticut .... OK.
Testing geography item 7: Tract *, Real County, Texas .... OK.
Testing geography item 8: Tract *, Costilla County, Colorado .... OK.
Testing geography item 9: Tract *, Sarpy County, Nebraska .... OK.
Testing geography item 10: Tract *, McLennan County, Texas .... OK.
Testing geography item 11: Tract *, Donley County, Texas .... OK.
Testing geography item 12: Tract *, McIntosh County, Georgia .... OK.
Testing geography item 13: Tract *, Chilton County, Alabama .... OK.
Testing geography item 14: Tract *, Richland County, Montana .... OK.
Testing geography item 15: Tract *, Mitchell County, Kansas .... OK.
Testing geography item 16: Tract *, Muscogee County, Georgia .... OK.
Testing geography item 17: Tract *, Martin County, Indiana .... OK.
Testing geography item 18: Tract *, Naguabo Municipio, Puerto Rico .... OK.
Testing geography item 19: Tract *, Aguas Buenas Municipio, Puerto Rico .... OK.
Testing geography item 20: Tract *, Washington County, Arkansas .... OK.

> # you may get different counties in your random set
>
> acs.fetch(geography=some.tracts, table.number="B01003")

Which will return population data from all the tracts in a random set of 20 counties.

Tags: , , , ,

Now on CRAN: acs.R version 1.0

Posted by Ezra Glenn on June 25, 2013
Census, Code, Self-promotion / No Comments

We are pleased to announce that the acs.R package is now ready for prime-time: version 1.0 was officially released last week and is now available on CRAN.1 This version, developed in partnership with the Puget Sound Regional Council, includes all the enhancements described in this post, plus additional tweaks, and lots of documentation.

Just to recap, as of version 1.0:

  • The package is now capable of downloading data directly from the new Census American Community Survey API and importing into R (with proper statistical treatment of estimates and error, variable and geographic relabeling, and more), all through a single “acs.fetch()” function;
  • The package includes a new “geo.make()” function to allow users to create their own custom geographies for organize and download data; and
  • The package provides two special “lookup” tools to help filter through all the existing Census geographies (with the “geo.lookup()” function) and tables (with the “acs.lookup()” function) to find exactly what they want. The acs.lookup function return new “acs.lookup” objects which can be saved, manipulated, and passed to acs.fetch() for downloading data.

I’ve also updated the user guide (version 1.0), which includes step-by-step instructions for working with the package, plus an extended example in the appendix on using blockgroup-level ACS data to create your own neighborhood geographies. (You can also view the complete package manual from the CRAN site.)

Finally, if you’re interested in staying in touch with the ongoing development of the package, be sure to sign up for the acs.R user group mailing list: to register, visit http://mailman.mit.edu/mailman/listinfo/acs-r.

Footnotes:

1 Note: the latest version of this package is actually 1.01, which includes a few additional big-fixes.

Tags: , , , , ,

acs-r Mailing List: keep in the loop

Posted by Ezra Glenn on April 24, 2013
Census, Code, Self-promotion / No Comments

We’re pleased to announce the creation of a new mailing list for the acs.R package. The “acs” package allows users to download, manipulate, analyze, and visualize data from the American Community Survey in R; the “acs-r” e-mail list allows members to keep in touch and share information about the package, including updates from the development team concerning improvements, user questions and help requests, worked examples, and more. To register, visit http://mailman.mit.edu/mailman/listinfo/acs-r.

Tags: , , , , ,

acs.R: a worked example using blockgroup-level data

Posted by Ezra Glenn on March 11, 2013
Census, Code / 2 Comments

A very nice user wrote the following in an email to me about the latest version of the acs.R package:

 
> Thanks for providing such a wonderful package in R. I'm having
> difficulty defining a geo at the block group level. Would you mind
> sharing an example with me?

I responded via email, but thought that my answer — which took the form of a short worked-example — might be helpful to others, so I am posting it here as well. Here’s what I said:

To showcase how the package can create new census geographies based on stuff like blockgroups, let’s look in my home state of Massachusetts, in Middlesex County. If I wanted to get info on all the block groups for tract 387201,1 I could create a new geo like this:

> my.tract=geo.make(state="MA", county="Middlesex", 
  tract=387201, block.group="*", check=T)
Testing geography item 1: Tract 387201, Blockgroup *, 
  Middlesex County, Massachusetts .... OK.
> 

(This might be a useful first step, especially if I didn’t know how many block groups there were in the tract, or what they were called. Also, note that check=T is not required, but can often help ensure you are dealing with valid geos.)

If I then wanted to get very basic info on these block groups – say, table number B01003 (Total Population), I could type:

> total.pop=acs.fetch(geo=my.tract, table.number="B01003")
> total.pop
ACS DATA: 
 2007 -- 2011 ;
  Estimates w/90% confidence intervals;
  for different intervals, see confint()
              B01003_001  
Block Group 1 2681 +/- 319
Block Group 2 952 +/- 213 
Block Group 3 1010 +/- 156
Block Group 4 938 +/- 214 
> 

Here we can see that the block.group=”*” has yielded the actual four block groups for the tract.

Now, if instead of wanting all of them, we only wanted the first two, we could just type:

> my.bgs=geo.make(state="MA", county="Middlesex", 
  tract=387201, block.group=1:2, check=T)
Testing geography item 1: Tract 387201, Blockgroup 1, 
  Middlesex County, Massachusetts .... OK.
Testing geography item 2: Tract 387201, Blockgroup 2, 
  Middlesex County, Massachusetts .... OK.
> 

And then:

> bg.total.pop=acs.fetch(geo=my.bgs, table.number="B01003")
> bg.total.pop
ACS DATA: 
 2007 -- 2011 ;
  Estimates w/90% confidence intervals;
  for different intervals, see confint()
              B01003_001  
Block Group 1 2681 +/- 319
Block Group 2 952 +/- 213 
> 

Now, if we wanted to add in some blockgroups from tract 387100 (a.k.a. “tract 3871” — but remember: we need those trailing zeroes) – say, blockgroups 2 and 3 – we could enter:

> my.bgs=my.bgs+geo.make(state="MA", county="Middlesex", 
  tract=387100, block.group=2:3, check=T)
Testing geography item 1: Tract 387100, Blockgroup 2, 
  Middlesex County, Massachusetts .... OK.
Testing geography item 2: Tract 387100, Blockgroup 3, 
  Middlesex County, Massachusetts .... OK.

And then:

> new.total.pop=acs.fetch(geo=my.bgs, table.number="B01003")
> new.total.pop
ACS DATA: 
 2007 -- 2011 ;
  Estimates w/90% confidence intervals;
  for different intervals, see confint()
              B01003_001  
Block Group 1 2681 +/- 319
Block Group 2 952 +/- 213 
Block Group 2 827 +/- 171 
Block Group 3 1821 +/- 236
> 

Note that the short rownames can be confusing – as in this example — but if you type:

> geography(new.total.pop)
           NAME state county  tract blockgroup
1 Block Group 1    25     17 387201          1
2 Block Group 2    25     17 387201          2
3 Block Group 2    25     17 387100          2
4 Block Group 3    25     17 387100          3
> 

you can see that the two entries for “Block Group 2” are actually in different tracts. (Also note: you can combine block groups and other levels of geography, all in a single geo objects…)

And now, to show off the coolest part! Let’s say I don’t just want to get data on the four blockgroups, but I want to combine them into a single new geographic entity. Before downloading, I could simply say:

> combine(my.bgs)=T
> combine.term(my.bgs)="Select Blockgroups"
> new.total.pop=acs.fetch(geo=my.bgs, table.number="B01003")
> new.total.pop
ACS DATA: 
 2007 -- 2011 ;
  Estimates w/90% confidence intervals;
  for different intervals, see confint()
                   B01003_001               
Select Blockgroups 6281 +/- 481.733328720362
>

And see – voila! – it sums the estimates and deals with the margins of error, so you don’t need to get your hands dirty with square roots and standard errors and all that messy stuff.

You can even create interesting nested geo.sets, where some of the lower levels are combined, like this:

> combine.term(my.bgs)="Select Blockgroups, 
  Tracts 387100 and 387201"
> more.bgs=c(my.bgs, geo.make(state="MA", 
  county="Middlesex", tract=370300, block.group=1:2, check=T), 
  geo.make(state="MA", county="Middlesex", tract=370400, 
  block.group=1:3, combine=T, combine.term="Select Blockgroups, 
  Tract 3703", check=T)) 
Testing geography item 1: Tract 370300, Blockgroup 1, 
  Middlesex County, Massachusetts .... OK.
Testing geography item 2: Tract 370300, Blockgroup 2, 
  Middlesex County, Massachusetts .... OK.
Testing geography item 1: Tract 370400, Blockgroup 1, 
  Middlesex County, Massachusetts .... OK.
Testing geography item 2: Tract 370400, Blockgroup 2, 
  Middlesex County, Massachusetts .... OK.
Testing geography item 3: Tract 370400, Blockgroup 3, 
 Middlesex County, Massachusetts .... OK.
> more.total.pop=acs.fetch(geo=more.bgs, table.number="B01003")
> more.total.pop
ACS DATA: 
 2007 -- 2011 ;
  Estimates w/90% confidence intervals;
  for different intervals, see confint()
                                             B01003_001               
Select Blockgroups, Tracts 387100 and 387201 6281 +/- 481.733328720362
Block Group 1                                315 +/- 132              
Block Group 2                                1460 +/- 358             
Select Blockgroups, Tract 3703               2594 +/- 487.719181496894
> 

In closing: I hope this helps, and be sure to contact me if you have other questions/problems about using the package.

Footnotes:

1 Note that tracts are often referred to in a strange “four-digit+decimal extension” shorthand, so “tract 387201” may be also known as “tract 3872.01”. When working with this package, be careful and always use six-digit tract numbers in this package without the decimal point. If the tract number seems to only be four digits long, add two extra “trailing” zeroes at the end.

Tags: , , , ,

Major improvements to acs.R: sneak peak at version 1.0

Posted by Ezra Glenn on February 05, 2013
Census, Code / 4 Comments

It’s been a while since I last updated the acs.R package, but as noted here, I’ll be using CityState to provide updates and test-versions of the package prior to uploading to CRAN. I’m happy to report that we now have a near-final package of version 1.0.

The most significant improvements to the package (beyond those mentioned previously) are the following;

  • The package is now capable of downloading data directly from the new Census American Community Survey API and importing into R (with proper statistical treatment of estimates and error, variable and geographic relabeling, and more), all through a single “acs.fetch()” function;
  • The package includes a new “geo.make()” function to allow users to create their own custom geographies for organize and download data; and
  • The package provides two special “lookup” tools to help filter through all the existing Census geographies (with the “geo.lookup()” function) and tables (with the “acs.lookup()” function) to find exactly what they want. These functions return new R “lookup” objects which can be saved, manipulated, and passed to acs.fetch() for downloading data.

I want to thank the very kind folks at the Puget Sound Regional Council, who have been supporting the development of this package (in exchange for some special attention to scripts and functions they really want to include for themselves and their member communities). They continue to provide excellent help and advice “from the trenches” as we refine the package.

If you’re interested in trying out the new version, you can download it below, along with a brief set of “Introductory Notes” written for the team at PSRC. (Users may also want to check out the manual for the previous version of the package and this article from 2011 on the package.)

Tags: , , , ,

Pitfalls of Working with Census Data

Posted by Ezra Glenn on July 27, 2012
Census, Missions, Reconnaissance, Shape Your Neighborhood / No Comments

Previous missions have demonstrated a whole lot of things you can do with census data. Here are a few of the problems you can get yourself into.

Census Geography Pitfalls

  • Unequal tracts: Despite what you may think, not all Census tracts (and their composite block groups and blocks) are created equal. The Census Bureau tries to structure their geography so that all tracts will be approximately the same size (about 4,000 people), but in practice there is a pretty large range (between 1,500 and 8,000 people per tract). If you are looking at raw numbers (counts of any sort), be sure to think about the overall population—it’s the denominator you’ll need to put the figures in perspective; conversely, if you are looking at percentages, remember that a small percentage of a large tract could actually be more people than a large percentage of a very small one.
  • Overlapping districts: Unfortunately, although the formal “pyramid” of Census geography is well-structured—building from block to block group to tract and so on up—our political and cultural divisions are not always so straightforward: cities sometimes spread across county lines, metropolitan areas may even cross state lines, and legislative districts have become a gerrymandered mess that would drive any rational cartographer to drink. As a result, there may be times when Census geographers have been forced to choose between a strictly “nested” geography that ignores higher-order political elements, and one with intermediate levels that do not fit neatly within each other.
  • Confusing or ambiguous place names: Partially related to the previous point, and partially due to the general orneriness of the culture (or perhaps the species), there are often times when the same name will occur in multiple places in Census geography. The name “New York” refers to a state, a metro region, a city, a county (strangely, one that is smaller than its city), and even an avenue in Atlantic City (or on the Monopoly Board). Luckily, once you get down to the level of census tracts and below, you enter the realm of pure-and-orderly numbers, and can largely avoid this trap—they are even sometimes referred to as “logical record numbers,” or LOGRECNO—although it’s a lot less fun to say “AR census tract 9803 block group 3” when you could be saying “Goobertown, Arkansas”.
  • Changing boundaries: Occasionally the Census Bureau needs to redraw the lines for some particular location—perhaps a city has annexed new land, or a large county has been split by an act of the state legislature. In these situations, you may see a sharp rise (or drop) in the counts from one Census to the next. For example, according to the 2000 census, the city of Bigfork, Montana had 1,421 people; in the 2010 census, this figure had grown to 4,270—a seeming tripling of the population. However, upon closer scrutiny, it turns out that most of this increase was the result of a change in census boundaries. (These situations may also exacerbate some of the previous problems.)

Tags: , , ,

Examining Historical Growth III: The forecast() package

Posted by Ezra Glenn on April 21, 2012
Data, Missions, Shape Your Neighborhood, Simulation / No Comments

In our last mission we used R to plot a trend-line for population growth in Houston, based on historical data from the past century. Depending on which of two different methods we used, we arrived at an estimate for the city’s 2010 population of 2,144,531 (based on the 100-year growth trend for the city) or 2,225,125 (based on the steeper growth trend of the past fifty years). Looking now at the official Census count for 2010, it turns out that our guesses are close, but both of too high: the actual reported figure for 2010 is 2,099,451.

It would have been surprising to have guessed perfectly based on nothing other than a linear trend — and the fact that we came as close as we did speaks well of this sort of “back of the envelope” projection technique (at least for the case of steady-growth). But there was a lot of information contained in those data points that we essentially ignored: our two trendlines were really based on nothing more than a start and an end point.

A more sophisticated set of tools for making projections — which may be able to extract some extra meaning from the variation contained in the data — is provided in R by the excellent forecast package, developed by Rob Hyndman of the Monash University in Australia. To access these added functions, you’ll need to install it:

> install.packages(forecast)
> library(forecast)

Time-series in R: an object with class

Although R is perfectly happy to help you analyze and plot time series data organized in vectors and dataframes, it actually has a specialized object class for this sort of thing, created with the ts() function. Remember: R is an “object-oriented” language. Every object (a variable, a dataframe, a function, a time series) is associated with a certain class, which helps the language figure out how to manage and interact with them. To find the class of an object, use the class() functions:

> a=c(1,2)
> class(a)
[1] "numeric"
> a=TRUE
> class(a)
[1] "logical"
> class(plot)
[1] "function"
> a=ts(1)
> class(a)
[1] "ts"
> 

Continue reading…

Tags: , , , , ,