census

Examining Historical Growth I: Basic trends

Posted by Ezra Glenn on April 11, 2012
Data, Missions, Shape Your Neighborhood, Simulation / No Comments

The nature of predictions

To paraphrase John Allen Paulos, author of A Mathematician Reads the Newspaper, all expert predictions can be essentially restated in one of two ways: “Things will continue roughly as they have been until something changes”; and its corollary, “Things will change after an indeterminate period of stability.” Although these statements are both true and absurd, they contain a kernel of wisdom: simply assuming a relative degree of stability and painting a picture of the future based on current trends is the first step of scenario planning. The trick, of course, is to never completely forget the “other shoe” of Paulos’s statement: as the disclaimer states on all investment offerings, “Past performance is not a guarantee of future results”; at some point in the future our present trends will no longer accurately describe where we are headed. (We will deal with this as well, with a few “safety valves.”)

From the second stage of the Rational Planning Paradigm (covered in the background sections of the book) we should have gathered information on both past and present circumstances related to our planning effort. If we are looking at housing production, we might have data on annual numbers of building permits and new subdivision approvals, mortgage rates, and housing prices; if we are looking at public transportation we might need monthly ridership numbers, information of fare changes, population and employment figures, and even data on past weather patterns or changes in vehicle ownership and gas prices. The first step of projection, therefore, is to gather relevant information and get it into a form that you can use.

Since we will be thinking about changes over time in order to project a trend into the future, we’ll need to make sure that our data has time as an element: a series of data points with one observation for each point or period of time is known as a time series. The exact units of time are not important—they could be days, months, years, decades, or something different—but it is customary (and important) to obtain data where points are regularly spaced at even intervals.1 Essentially, time series data is a special case of multivariate data in which we treat time itself as an additional variable and look for relationships as it changes. Luckily, R has some excellent functions and packages for dealing with time-series data, which we will cover below in passing. For starters, however, let’s consider a simple example, to start to think about what goes into projections. Continue reading…

Tags: , , , , ,

acs Package at Upcoming Conference: UseR! 2012

Posted by Ezra Glenn on April 09, 2012
Census, Code, Self-promotion / No Comments

I’m happy to report that I’ll be giving a paper on my acs package at the 8th annual useR! conference, Coming June 12-15th to Vanderbilt University in Nashville, TN. The paper is titled “Estimates with Errors and Errors with Estimates: Using the R acs Package for Analysis of American Community Survey Data.” Here’s the abstract:


"Estimates with Errors and Errors with Estimates: Using the R acs
Package for Analysis of American Community Survey Data"
Ezra Haber Glenn

Over the past decade, the U.S. Census Bureau has implemented the
American Community Survey (ACS) as a replacement for its traditional
decennial ``long-form'' survey.  Last year—for the first time
ever—ACS data was made available at the census tract and block group
level for the entire nation, representing geographies small enough to
be useful to local planners; in the future these estimates will be
updated on a yearly basis, providing much more current data than was
ever available in the past.  Although the ACS represents a bold
strategy with great promise for government planners, policy-makers,
and other advocates working at the neighborhood scale, it will require
them to become comfortable with statistical techniques and concerns
that they have traditionally been able to avoid.

To help with this challenge the author has been working with
local-level planners to determine the most common problems associated
with using ACS data, and has implemented these functions as a package
in R.  The package—currently hosted on CRAN in version 0.8—defines
a new ``acs'' class object (containing estimates, standard errors, and
metadata for tables from the ACS), with methods to deal appropriately
with common tasks (e.g., combining subgroups or geographies,
mathematical operations on estimates, tests of significance, plots of
confidence intervals, etc.).

This paper will present both the use and the internal structure of the
package, with discussion of additional lines of development.

Hope to see you all there!

Tags: , , , , ,

A Richer Neighborhood Profile, Part I: Getting tract-level data

Posted by Ezra Glenn on April 08, 2012
Census, Missions, Reconnaissance, Shape Your Neighborhood / No Comments

In a previous mission (see Finding Obama in the smallest Census geography) we delved down to the see what data was available at the level of individual blocks. Unfortunately, as we noted there, the Census doesn’t provide a whole lot of useful data at the block-level, since the results will exclude sample data from the SF3 “long form” (or, post-2000, the American Community Survey). If we want to know more about a neighborhood we will need to think in slightly larger geographies, and seek data at the tract-level or higher.

For this mission, we’ll be zooming into to Park Slope neighborhood on Brooklyn, and gathering data on income, race, education, and the breakdown of owners and renters for a single census tract. Since its often helpful to be able to view data like this in the context of the surrounding neighborhood, subsequent missions will explore ways to make comparisons with this sort of data, either to other tracts or to larger geographies.

But for starters, our target: although defining the exact edges of a neighborhood is never easy – especially ones in dense, diverse areas, where even residents disagree over terminology and the continual processes of gentrification, urban decline, migration, and other demographic shifts continually redefine the categories – most observers would agree that the neighborhood extends roughly north and west from Bartel Pritchard Square, at the lower corner of Prospect Park, with both 15th Street and Prospect Park itself providing something of an “edge.” Since edges are often exciting places to observe change, we will select an address along 15th Street, near the corner of 5th Avenue. Continue reading…

Tags: , , , ,

Building Blocks: Finding Obama in the smallest Census geography

Posted by Ezra Glenn on April 02, 2012
Missions, Reconnaissance, Shape Your Neighborhood / 1 Comment

The most basic unit of the U.S. Census is the individual household — that’s who fills out the surveys – but the Census won’t report data at the household level: in order to deliver on its promise of privacy and confidentiality (and thereby ensure our willingness to be enumerated), the Census always aggregates data before releasing it. This is important, and should become something of a mantra for would-be data analysts: all Census data is summary data. That said, we can still learn quite a lot at these micro-geographies, especially when we know what we are looking for.

Finding Barack

As an example of how to work with the building blocks of Census summary data – the individual “blocks” – let’s go back a bit in time and look at a very particular neighborhood in Chicago. At the time of the 2000 Census, President Obama was serving as a Senator from Illinois, living at 5429 S. Harper Avenue in Chicago. Starting with just an address, you can easily find how it fits into the census geography on the “American FactFinder” site: just visit the main Census site, click the menu-bar for Data, and select the link for American FactFinder.

Continue reading…

Tags: , , , , , ,

Constantly Improving: acs development versions

Posted by Ezra Glenn on March 29, 2012
Census, Code / No Comments

As noted elsewhere here on CityState, I’ve developed a package for working with data from the American Community Survey in the R statistical computing language. The most recent official version of the package is 0.8, which can be found on CRAN. Since the package is still in active development, I’ve decided to provide development snapshots here, for users who are looking to work with the latest code as I develop it.

I’m hoping that the next major release will be version 1.0, due out sometime this spring. As I work towards that, here is version 0.8.1, which can be considered the first “snapshot” headed toward this release.

acs_0.8.1.tar.gz

To install, simply download, start R, and type:

 

> install.packages("path/to/file//acs_0.8.1.tar.gz") > library(acs)

Updates include:

  • read.acs can now accept either a csv or a zip file downloaded directly from the FactFinder site, and it does a much better job (a) guessing how many rows to skip, (b) figuring out how to generate intelligent variable names for the columns, and (c) dealing with arcane non-numeric symbols used by FactFinder for some estimates and margins of error.
  • plot now includes a true.min= option, which allows you to

specify whether you want to allow error bars to span into negative values (true.min=T, the default), or to bound them at zero (true.min=F – or some other numeric value). This seemed necessary because it looks silly to say “The number of children who speak Spanish in this tract is 15, plus or minus 80…” At the same time, if the variable turns out to be something like the difference in the income of Males and the income if Females in the geography, a negative value may make a lot of sense, and should be plotted as such.

Tags: , , , ,

acs Package Updated: version 0.8 now on CRAN

Posted by Ezra Glenn on March 18, 2012
Census, Code / 2 Comments

I’ve just released a new version of my acs package for working with the U.S. Census American Community Survey data in R, available on CRAN. The current version 0.8 includes all the original version 0.6 code, plus a whole lot more features and fixes. Some highlights:

  • An improved read.acs function for importing data downloaded from the Census American FactFinder site.
  • rbind and cbind functions to help create larger acs objects from smaller ones.
  • A new sum method to aggregate rows or columns of ACS data, dealing correctly with both estimates and standard errors.
  • A new apply method to allow users to apply virtually any function to each row or column of an acs data object.
  • A snazzy new plot method, capable of plotting both density plots (for estimates of a single geography and variable) and multiple estimates with errors bars (for estimates of the same variable over multiple geographies, or vice versa). See sample plots below.

 

  • New functions to deal with adjusting the nominal values of currency from different years for the purpose of comparing between one survey and another. (See currency.convert and currency.year in the documentation.)
  • A new tract-level dataset from the ACS for Lawrence, MA, with dollar value currency estimates (useful to show off the aforementioned new currency conversion functions).
  • A new prompt method to serve as a helper function when changing geographic rownames or variable column names.
  • Improved documentation on the acs class and all of these various new functions and methods, with examples.

With this package, once you’ve found and downloaded your data from FactFinder, you can read it into R with a single command, aggregate multiple tracts into a neighborhood with another, generate a table of estimates and confidence intervals for your neighborhood with a third command, and a produce a print-ready plot of your data (complete with error bars for the margins of error) with a fourth:

my.data=read.acs("some_data.csv")
my.neighborhood=apply(my.data, FUN="sum", MARGIN=1, agg.term="My.Neighborhood") 
confint(my.neighborhood, conf.level=.95) 
plot(my.neighborhood, col="blue", err.col="violet", pch=16)

Already this package has come a long way, in large part thanks to the input of R users, so please check it out and let me know what you think — and how I can make it better.

Tags: , , , ,

Mel King Institute Training: ACS for CDCs

Posted by Ezra Glenn on February 11, 2012
Census, Good Causes / No Comments

On March 14, 2012, I’ll be working again with the Mel King Institute for Community Building to offer a half-day training in “Making Use of Local Census Data.” We designed the class for planners and community development practitioners working at the neighborhood-scale, and we’ll talk about ways to access the latest data from the U.S. Census American Community Survey (and how to use it responsibly).

Unlike earlier versions of the training, we’ll be working exclusively with the New American Factfinder (previously discussed in this post) to download data. We’ve also moved the class to one of MIT’s computer labs, and added an hour at the end as a “clinic,” so participants will get some hands-on time to dig up data on their own community.

For more information about the Mel King Institute, or to register for the training, see this page. See you there!

Tags: , , ,

Supplemental Poverty Measure: some movement

Posted by Ezra Glenn on November 09, 2011
Census, Data, News/Commentary / No Comments

Update: a few weeks ago I posted this article calling attention to yet more delays in the unrolling of the long-awaited Supplemental Poverty Measure from the Census Bureau.   As it turns out, they have recently announced that this new index is in fact ready for prime time (see, for example, this press release).

More analysis and thoughts later after I am able to take a look, but I wanted to file something quick just to acknowledge the effort to get something out there.

Tags: , ,

Census Bureau: U.S. too poor to develop supplemental poverty measure

Posted by Ezra Glenn on October 17, 2011
Census, News/Commentary / No Comments

Last year, I was excited to learn that Census Bureau was beginning work to establish a new, supplemental poverty measure, to address long-standing problems with the official statistic.  The Bureau was quick to ensure us that the official poverty measure would continue to be used to establish eligibility for government programs, and “will remain the definitive statistical measure,” but based on their elegant description of the new measure as “a more complex and refined statistic,” both the data fiends and the poverty scholars started to get excited.  I was reminded of a great story by Barry Bluestone, Director of the Dukakis Center at Northeastern University: as he tells it, while trying to explain about different ways to measure unemployment to a reporter, he became frustrated at the press’s unwillingness to delve into the complexity of these numbers.  When the reporter explained, “I can’t print three different numbers for unemployment—people won’t follow that,” Bluestone retorted, “Have you ever read your paper?,” and went on to point out how almost every section had multiple measures: weather (temperature, wind-chill, humidity index), business (high, low, 52-week average), sports (batting average, slugging percentage, RBIs, OBP, and so on).

Unfortunately, it appears that the supplemental poverty measure is the latest good idea to fall victim to budget cuts: in a recent update (which received significantly less press than the original release), the Bureau reports:

Since the FY 2011 federal budget did not include the funding requested by the President for the Supplemental Poverty Measure (SPM) initiative, the Census Bureau and the Bureau of Labor Statistics do not currently have the resources necessary to move the Supplemental Poverty Measure from research mode to production mode. Without these additional resources, the September 2011 release date for the Supplemental Poverty Measure estimates suggested in the Interagency Technical Working Group document is not feasible.

The update goes on to note the useful ground-work that was undertaken over the past 18 months on the topic, including a few conferences and some very useful reports (see the Census Bureau page collecting Working Papers and Conference Presentations), and promising some modified approach to yield at least partial results in the near future, but overall you gotta figure it’s pretty bad when we can’t even afford to measure how poor we’ve become.

Tags: , ,

New American FactFinder: initial grumbling

Posted by Ezra Glenn on October 07, 2011
Census, Self-promotion / No Comments

The Census Bureau continues to roll out the latest data from the American Community Survey, last month announcing the availability of the first real nationwide data for 2010 in the form of the 1-year ACS Estimates.  I’ve been conducting some trainings on how to get and use ACS data for local-level community planning (self promotion: check out the Mel King Institute’s training in Boston, or wait for us to offer it again), which has prompted me to pay some more attention to the new “American FactFinder” platform, which has prompted me to write this post.

Continue reading…

Tags: , , ,