acs.R Warning / Guidance: working with combined geo.sets

Posted by Ezra Glenn on April 04, 2016
Uncategorized / Comments Off on acs.R Warning / Guidance: working with combined geo.sets

If you have been using the acs package to create custom geo.sets which combine existing census geographies (i.e., geo.sets with “combine=T”), please read on.

It has come to my attention that some users working with custom combined geo.sets may be introducing errors into their data if they attempt to combine census variables dealing with medians, percentages, or similar derived summary data.

Most of the data available through the package (via the ACS and the Decennial Census APIs) comes in the form of raw counts – numbers of people, households, commuters, etc. When a geo.set includes multiple elements and “combine=T”, the package will fetch the data, and then combine the geographies by (1) adding the estimates and (2) calculating the standard errors of these aggregate estimates. This procedure is absolutely proper for count-data, but it is not appropriate for median incomes (or median ages, or mean incomes, or mean travel times, or derived percentages, etc.).

For example, if you attempt to aggregate three tracts with median incomes of $25,000, $35,000, and $50,000 into a single neighborhood, the acs.fetch will return a neighborhood with an “aggregate” median income of $110,000: wrong.

A quick demonstration:

> all.us=geo.make(state=fips.state[1:51,2], combine=T)
> median.income=acs.fetch(geography=all.us, table.number="B06011", endyear=2014, span=1)
> median.income

Try this and you’ll see that the country’s “median income” is $1,394,002…

In the package’s defense, there really isn’t a proper way to aggregate median incomes like this. Since medians – or means, or percentages – are derived from underlying data, they are really “summaries,” and without at least some more info about the underlying data you can’t always properly combine them. So, in the example above, we know that the median income for the neighborhood is somewhere between $25,000 and $50,000, but not really where. We can take a median of the medians ($35,000), or a mean of the medians ($100,000 / 3 = $36,667), but these are just guesses as well: without knowing how many observations there were in each tract and what they incomes were, we simply can’t calculate it. (This is why I didn’t think it would be an issue – but now I’m thinking at least a stronger warning somewhere would be a good idea, hence this post and some new language I’ll add to the guidance docs.)

Please note that this issue only occurs when users create geo.sets with multiple elements and then combine them (by setting “combine=T” in the geo.set) before passing them to acs.fetch to download data. As long as you are not combining multiple tracts, counties, blockgroups, etc., the package is still fine for fetching and working with median incomes, percentage, and the like. (But be careful: your own code may slip in similar mistakes, if you combine this sort of data.)

Please pass on this info to your colleagues who may be using the package, and be sure to check your code if it (a) deals with combined geo.set and (b) downloads non-count data. If you have any questions or concerns, by all means ask contact me mailto:eglenn@mit.edu and I’ll be happy to discuss more.

Thanks, and sorry if this wasn’t clear.

API Update: Back online now (3/27/2016)

Posted by Ezra Glenn on March 27, 2016
Uncategorized / Comments Off on API Update: Back online now (3/27/2016)

Update: the API seems to be back up. Sorry for the outage, and fetch away!

API Update: Unexplained Census API Outage, 3/26/2016

Posted by Ezra Glenn on March 26, 2016
Uncategorized / Comments Off on API Update: Unexplained Census API Outage, 3/26/2016

Warning:

For some reason, the census API seems to be down today (Sat March 26, 2016), so you may be getting errors when you try to fetch data. They look like this:

Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
No data found at:
  http://api.census.gov/data/2014/acs5?... [with lots more stuff here]

The error has nothing to do with the acs package or R, but is a census-side problem. To my knowledge, the API has been pretty reliable, so let’s hope it’s back up soon. Sorry for any problems all the same.

acs version 2.0: now on CRAN

Posted by Ezra Glenn on March 14, 2016
Code / Comments Off on acs version 2.0: now on CRAN

After far too long, we are pleased to release version 2.0 of the acs package.

The biggest improvement is full support for all ACS, SF1, and SF3 data currently available via the Census API, including ACS data from 2005-2014 and Decennial data from 1990, 2000, and 2010. (See below for more info.)

1 Downloading and installing

To install the updated version, simply fire up an R session and type:

> install.packages("acs", clean=T)

2 Learn more

To learn more about the package, see the following:

And be sure to join the acs.R User Group Mailing List.

3 Notes and updates

A few notes about this new package:

  • API Keys: by default, when R updates a package, it overwrites the old package files. Unfortunately, that is where archived api.keys get saved by api.key.install(). As part of the version 2.0 package installation, “configure” and “cleanup” scripts can be run which try to migrate the key to a new location. If this fails, the install script will suggest that users run api.key.migrate() after installation, which might resolve the issue. At worst, if both methods fail, a user can simply re-run api.key.install() with the original key and be good to go.
  • endyear now required: under the old package, acs.fetch and acs.lookup would default to endyear=2011 when no endyear was provided. This seemed smart at the time – 2011 was the most recent data available – but it is becoming increasingly absurd. One solution would have been to change the default to be whatever data is most recent, but that would have the unintended result of making the same script run differently from one year to the next: bad mojo. So the new preferred “version 2.0 solution” is to require users to explicitly indicate the endyear that they want to fetch each time. Note that this may require some changes to existing scripts.
  • ACS Data Updates: the package now provides on-board support for all endyears and spans currently available through the API, including:
    • American Community Survey 5-Year Data (2005-2009 through 2010-2014)
    • American Community Survey 3 Year Data (2013, 2012)
    • American Community Survey 1 Year Data (2014, 2013, 2012, 2011)

    See http://www.census.gov/data/developers/data-sets.html for more info, including guidance about which geographies are provided for each dataset.

  • Decennial Census Data: for the first time ever, the package now also includes the ability to download Decennial Data from the SF1 and SF3, using the same acs.fetch() function used for ACS data.
    • SF1/Short-Form (1990, 2000, 2010)
    • SF3/Long-Form (1990, 2000)1

    When fetched via acs.fetch(), this data is downloaded and converted to acs-class objects. (Note: standard errors for Decennial data will always be zero, which is technically not correct for SF3 survey data, but no margins of error are reported by the API.) See http://www.census.gov/data/developers/data-sets/decennial-census-data.html for more info.

    Also note that census support for the 1990 data is a bit inconsistent – the variable lookup tables were not in the same format as others, and far less descriptive information has been provided about table and variable names. This can make it tricky to find and fetch data, but if you know what you want, you can probably find it; looking in the files in package’s extdata directory might help give you a sense of what the variable codes and table numbers look like.

  • Other improvements/updates/changes:
    • CPI tables: the CPI tables used for currency.year() and currency.convert() have been updated to include data up through 2015.
    • acs.fetching with saved acs.lookup results: the results of acs.lookup can still be saved and passed to acs.fetch via the “variable=” option,2 with a slight change: under v. 1.2, the passed acs.lookup results would overrule any explicit endyear or span; with v 2.0, the opposite is true (the endyear and span in the acs.lookup results are ignored by acs.fetch). This may seem insignificant, but it will eventually be important, when users want to fetch data from years that are more recent than the version of the package, and need to use old lookup results to do so.
    • divide.acs fixes: the package includes a more robust divide.acs() function, which handles zero denominators better and takes full advantage of the potential for reduced standard errors when dividing proportions.
    • acs.tables.install: to obtain variable codes and other metadata needed to access the Census API, both acs.fetch and acs.lookup must consult various XML lookup files, which are provided by the Census with each data release. To keep the size of the acs package within CRAN guidelines and to ensure tables will always be up-to-date, as of version 2.0 these files are accessed online at run-time for each query, rather than being bundled with each package release. As an alternative to these queries, users may use acs.tables.install to download and archive all current tables (approximately 10MB, as of version 2.0 release), which are saved by the package and consulted locally when present.

      Use of this function is completely optional and the package should work fine without it (assuming the computer is online and is able to access the lookup tables), but running it once may result in faster searches and quicker downloads for all subsequent sessions. (The results are saved and archived, so once a user has run the function, it is unnecessary to run again, unless the acs package is re-installed or updated.)

Other than these points, everything should run the same as the acs package you’ve come to know and love, and all your old scripts and data objects should still be fine. (Again, with the one big exception that you’ll need to add “endyear=XXXX” to any calls to acs.fetch and acs.lookup.)

Special thanks to package beta testers (Ari, Arin, Bethany, Emma,John, and Michael) and the entire acs-r community, as well as to Uwe and Kurt at CRAN for their infinite patience and continuing care and stewardship of the system.

Footnotes:

1

SF3 was discontinued after 2000 and replaced with the ACS.

2

did you even know this was possible…???

Tags: , , , ,

acs version 1.3: test-drive it now

Posted by Ezra Glenn on March 06, 2016
Uncategorized / Comments Off on acs version 1.3: test-drive it now

After far too long, we are nearing completion of version 1.3 of the acs package. As a special benefit to our loyal readers on CityState and members of the the acs.R mailing list,1 we are making available a special sneak-peak, pre-release version for you to try out. The biggest improvement is full support for all ACS, SF1, and SF3 data currently available via the Census API, including ACS data from 2005-2014 and Decennial data from 1990, 2000, and 2010. (See below for more info.)

Continue reading…

ACS 2010-2014 Data Now Available

Posted by Ezra Glenn on December 04, 2015
Uncategorized / Comments Off on ACS 2010-2014 Data Now Available

Just in time for the holidays, the Census has released new American Community Survey data, covering all states, counties, cities, and towns, down to the census tract and block-group level for the 2010–2014 five-year period. Luckily, the data is also available via the Census Census API, which mean it is available to users of the the acs.R package (version 1.2 or later; if you’re not sure which version you are using, you can always type packageVersion(“acs”) to find out.)

To get the latest data, just continue to use the acs.fetch() function as usual, but specify endyear=2014.1 Also, be aware that the function will give you some warnings about how “As of the date of this version of the acs package Census API did not provides data for selected endyear” – but you can safely ignore that, and the data will still be fetched.

Happy downloading!

Footnotes:

1

Note that by default, endyear is set to 2011 if no year is explicitly passed to acs.fetch, and I didn’t want to change this for fear of breaking existing user scripts. In the future, we might to rethink this, so that it selects the most recent endyear by default.

Making maps with ACS data

Posted by Ezra Glenn on August 26, 2015
Uncategorized / Comments Off on Making maps with ACS data

A new blog post on RevolutionAnalytics describes a way to use the acs.R and choroplethr packages in R to make maps based on data from the US Census American Community Survey. Click here for more.

A user asks…: acs.R and the 2013 census data

Posted by Ezra Glenn on August 17, 2015
Uncategorized / Comments Off on A user asks…: acs.R and the 2013 census data

An acs.R user asks:

Are there any plans for 2013 data to be incorporated into the acs
package?

Great question. Here is a great answer:

At present, the package is actually able to fetch the 2013 5-year ACS data, with two important caveats:

  1. you must specify the table number or variable number directly – you can’t use keywords, since the current version of the package lacks the correct lookup tables for 2013; and
  2. the acs.fetch function will give you some warnings about how “As of the date of this version of the acs package Census API did not provides data for selected endyear” – but you can safely ignore that.

See below for a basic example. (This said, I do want to release an updated version soon that will include the lookup tables and avoid the warnings.)

> acs.fetch(geography=geo.make(state=25, county="*"), table.number="B01003", endyear=2013)
ACS DATA: 
 2009 -- 2013 ;
  Estimates w/90% confidence intervals;
  for different intervals, see confint()
                                 B01003_001   
Barnstable County, Massachusetts 215449 +/- 0 
Berkshire County, Massachusetts  130545 +/- 0 
Bristol County, Massachusetts    549870 +/- 0 
Dukes County, Massachusetts      16739 +/- 0  
Essex County, Massachusetts      750808 +/- 0 
Franklin County, Massachusetts   71408 +/- 0  
Hampden County, Massachusetts    465144 +/- 0 
Hampshire County, Massachusetts  159267 +/- 0 
Middlesex County, Massachusetts  1522533 +/- 0
Nantucket County, Massachusetts  10224 +/- 0  
Norfolk County, Massachusetts    677296 +/- 0 
Plymouth County, Massachusetts   497386 +/- 0 
Suffolk County, Massachusetts    735701 +/- 0 
Worcester County, Massachusetts  802688 +/- 0 
Warning messages:
1: In acs.fetch(geography = geo.make(state = 25, county = "*"), table.number = "B01003",  :
  As of the date of this version of the acs package
  Census API did not provides data for selected endyear
2: In acs.fetch(endyear = endyear, span = span, geography = geography[[1]],  :
  As of the date of this version of the acs package
  Census API did not provides data for selected endyear

Presenting acs.R at the ACS Data User Conference

Posted by Ezra Glenn on April 06, 2015
Census, Code / Comments Off on Presenting acs.R at the ACS Data User Conference

On May 12, 2015, I’ll be presenting the acs.R package in a session of the American Community Survey Data User Group Conference in Hyattsville, MD. The paper, titled “Estimates with errors and errors with estimates: Using the R ‘acs’ package for analysis of American Community Survey data,” is available through the SSRN or my faculty publications webpage.

Better yet, the session will also include a presentation by Michael Laviolette, Dennis Holt, and Kristin K. Snow of the State of New Hampshire Department of Health and Human Services on “Using the R Language and ‘acs’ Package to Compile and Update a Social Vulnerability Index for New Hampshire.” It’s great to see how planners are using and extending this package in all sorts of exciting new settings and applications.

Click these links to see the complete program or to register for the conference.

acs.R Question: using FIPS codes as rownames

Posted by Ezra Glenn on October 28, 2014
Uncategorized / Comments Off on acs.R Question: using FIPS codes as rownames

Q: An acs.R user asks:

Is it possible for an acs object to use FIPS codes for rownames?

A: Absolutely. Here’s how:

Start with some data:

> some.geo=geo.make(state=25, county=001, tract="*")
> some.data=acs.fetch(geography=some.geo, table.number="B01003")

Check out the geography functions:

> head(geography(some.data))
> 

The output of the final command should display the start of a dataframe with descriptive titles in the first column, but then fips codes for the state, county, and tract. When displaying acs objects, the first column of the object’s geography() dataframe is automatically used to name the rows. But this can be changed – see ?geography.

To use FIPS codes instead, we can extract the relevant columns from the object’s geography() and paste them together to recreate fully qualified FIPS codes. (The relevant columns are everything expect the first one, so “geography(some.data)[-1]” will do the trick.)

> my.fips.codes=apply(X=geography(some.data)[-1], MARGIN=1, FUN=paste, collapse="")
> 

Then we can re-assign the object’s geography() to include these codes as the first column:

> geography(some.data)=cbind(my.fips.codes, geography(some.data))
> 

Now try:

> head(some.data)
ACS DATA: 
 2007 -- 2011 ;
  Estimates w/90% confidence intervals;
  for different intervals, see confint()
          B01003_001  
251010100 2994 +/- 13 
251010206 2858 +/- 256
251010208 1903 +/- 260
251010304 2395 +/- 269
251010306 2616 +/- 270
251010400 3056 +/- 296
>  

and you should see FIPS codes as the rownames.

Important note: Given that I actually don’t work with FIPS codes all that often, there is a chance I’ve deviated slightly from the proper formatting here – you may need to paste in extra leading zeroes or something to make sure the pieces line up in that apply/paste command – but hopefully you get the idea. (For example, I think tract IDs are supposed to be six digits long, not three.)