Shape Your Neighborhood

Pitfalls of Working with Census Data

Posted by Ezra Glenn on July 27, 2012
Census, Missions, Reconnaissance, Shape Your Neighborhood / No Comments

Previous missions have demonstrated a whole lot of things you can do with census data. Here are a few of the problems you can get yourself into.

Census Geography Pitfalls

  • Unequal tracts: Despite what you may think, not all Census tracts (and their composite block groups and blocks) are created equal. The Census Bureau tries to structure their geography so that all tracts will be approximately the same size (about 4,000 people), but in practice there is a pretty large range (between 1,500 and 8,000 people per tract). If you are looking at raw numbers (counts of any sort), be sure to think about the overall population—it’s the denominator you’ll need to put the figures in perspective; conversely, if you are looking at percentages, remember that a small percentage of a large tract could actually be more people than a large percentage of a very small one.
  • Overlapping districts: Unfortunately, although the formal “pyramid” of Census geography is well-structured—building from block to block group to tract and so on up—our political and cultural divisions are not always so straightforward: cities sometimes spread across county lines, metropolitan areas may even cross state lines, and legislative districts have become a gerrymandered mess that would drive any rational cartographer to drink. As a result, there may be times when Census geographers have been forced to choose between a strictly “nested” geography that ignores higher-order political elements, and one with intermediate levels that do not fit neatly within each other.
  • Confusing or ambiguous place names: Partially related to the previous point, and partially due to the general orneriness of the culture (or perhaps the species), there are often times when the same name will occur in multiple places in Census geography. The name “New York” refers to a state, a metro region, a city, a county (strangely, one that is smaller than its city), and even an avenue in Atlantic City (or on the Monopoly Board). Luckily, once you get down to the level of census tracts and below, you enter the realm of pure-and-orderly numbers, and can largely avoid this trap—they are even sometimes referred to as “logical record numbers,” or LOGRECNO—although it’s a lot less fun to say “AR census tract 9803 block group 3” when you could be saying “Goobertown, Arkansas”.
  • Changing boundaries: Occasionally the Census Bureau needs to redraw the lines for some particular location—perhaps a city has annexed new land, or a large county has been split by an act of the state legislature. In these situations, you may see a sharp rise (or drop) in the counts from one Census to the next. For example, according to the 2000 census, the city of Bigfork, Montana had 1,421 people; in the 2010 census, this figure had grown to 4,270—a seeming tripling of the population. However, upon closer scrutiny, it turns out that most of this increase was the result of a change in census boundaries. (These situations may also exacerbate some of the previous problems.)

Tags: , , ,

Freedom of Information: Public Records Requests

Posted by Ezra Glenn on July 20, 2012
Missions, Reconnaissance, Shape Your Neighborhood / No Comments

Many of the “reconnaissance” missions found here either explore some existing public database (such as the Census, via FactFinder), or demonstrate innovative tools to gather new data for your planning efforts. Sometimes, however, you will be interested in making use of information contained within City Hall or some other public agency, but it is not entirely clear how to gain access it. When access to information is the problem, it is important to understand the notion of public records and the proper application of the “Federal Freedom of Information Act” (and any parallel state legislation).

What the law says

President Lyndon B. Johnson signed the Freedom of Information Act (known as FOIA) into law — appropriately, albeit somewhat ironically, given the players involved — on July 4, 1966, thereby establishing a consistent process under which citizens could enforce their right to access government records. Importantly — as legal scholars and civil rights advocates will be quick to note — the right itself existed prior to the Act: in a democratic society, public records are already public, and we have every right to view them, copy them, analyze and distribute them, and otherwise be as nosy as we want to be. Remember: it’s government of the people by the people, and we shouldn’t be shy about asking what it’s up to even when it operates behind closed doors.

Over the past five decades, FOIA requests under the act have become a standard step in investigative journalism, citizen government accountability movements, and even high-profile lawsuits. Nationwide, more than 500,000 requests are made each year, resulting (eventually — see below) in one of the most abundant sources of information about an astonishing array of topics, from arrest records to political campaign spending, from tax assessments to high-school drop-out rates, and so on.1

Continue reading…

Tags: , ,

Pitfalls of Working with Time-Series Data

Posted by Ezra Glenn on April 24, 2012
Data, Missions, Shape Your Neighborhood, Simulation / No Comments

In addition to the general caution against using past data for projecting future conditions (and the need for equally spaced time intervals mentioned above), the particulars of time series data require additional attention to some special issues.

Inflation and Constant Dollars

Any time series that deals with dollars (or yen, pounds sterling, wampum, or other forms of currency) must confront the fact that the value of money changes over time. If you are simply making a time series showing the shrinking value of the dollar, that’s fine — it’s what you want to show — but if you want to show something else (say, changes in wages or home prices), then you will need to correct your data to some common base. Usually this is done by starting with a base year (often the start or end of the series, or the “current” year) and adjusting values based on changes to some official inflation statistic (e.g., the consumer price index).1

Growth and Change to the Underlying Population

Over time — especially over long periods — the population of a place can change quite a lot, both in terms of overall numbers and the demographic components. As with inflation, this may be precisely the change that you are interested in observing and predicting (as in the first examples in this chapter), but at times it can introduce a spurious or intervening variable into your analysis.

Continue reading…

Tags: , , , , ,

Examining Historical Growth III: The forecast() package

Posted by Ezra Glenn on April 21, 2012
Data, Missions, Shape Your Neighborhood, Simulation / No Comments

In our last mission we used R to plot a trend-line for population growth in Houston, based on historical data from the past century. Depending on which of two different methods we used, we arrived at an estimate for the city’s 2010 population of 2,144,531 (based on the 100-year growth trend for the city) or 2,225,125 (based on the steeper growth trend of the past fifty years). Looking now at the official Census count for 2010, it turns out that our guesses are close, but both of too high: the actual reported figure for 2010 is 2,099,451.

It would have been surprising to have guessed perfectly based on nothing other than a linear trend — and the fact that we came as close as we did speaks well of this sort of “back of the envelope” projection technique (at least for the case of steady-growth). But there was a lot of information contained in those data points that we essentially ignored: our two trendlines were really based on nothing more than a start and an end point.

A more sophisticated set of tools for making projections — which may be able to extract some extra meaning from the variation contained in the data — is provided in R by the excellent forecast package, developed by Rob Hyndman of the Monash University in Australia. To access these added functions, you’ll need to install it:

> install.packages(forecast)
> library(forecast)

Time-series in R: an object with class

Although R is perfectly happy to help you analyze and plot time series data organized in vectors and dataframes, it actually has a specialized object class for this sort of thing, created with the ts() function. Remember: R is an “object-oriented” language. Every object (a variable, a dataframe, a function, a time series) is associated with a certain class, which helps the language figure out how to manage and interact with them. To find the class of an object, use the class() functions:

> a=c(1,2)
> class(a)
[1] "numeric"
> a=TRUE
> class(a)
[1] "logical"
> class(plot)
[1] "function"
> a=ts(1)
> class(a)
[1] "ts"

Continue reading…

Tags: , , , , ,

Examining Historical Growth II: Using the past to predict the future

Posted by Ezra Glenn on April 12, 2012
Data, Missions, Shape Your Neighborhood, Simulation / No Comments

In our previous mission we plotted population numbers in Houston for 1900–2000, to start to understand the growth trend for that city. Now, what if we didn’t have access to the latest Census figures, and we wanted to try to guess Houston’s population for 2010, using nothing but the data from 1900–2000?

One place to start would be with the 2000 population (1,953,631) and adjust it a bit based on historical trends. With 100 year’s worth of data, we can do this in R with a simple call to some vector math.1

> attach(houston.pop) # optional, see footnote
> population[11]      # don't forget: 11, not 10, data points
[1] 1953631
> annual.increase=(population[11]-population[1])/100   # watch the parentheses!
> population[11]+10*annual.increase
[1] 2144531

Remember that we actually have eleven data points, since we have both 1900 and 2000, so we need to specify population[11] as our endpoint. But since there are only ten decade intervals, we divide by 100 to get the annual increase. Adding ten times this increase to the 2000 population, we get an estimate for 2010 of 2,144,531. (Bonus question: based on this estimated annual increase, in what year would Houston have passed the two-million mark?2)

Continue reading…

Tags: , , , , ,

Examining Historical Growth I: Basic trends

Posted by Ezra Glenn on April 11, 2012
Data, Missions, Shape Your Neighborhood, Simulation / No Comments

The nature of predictions

To paraphrase John Allen Paulos, author of A Mathematician Reads the Newspaper, all expert predictions can be essentially restated in one of two ways: “Things will continue roughly as they have been until something changes”; and its corollary, “Things will change after an indeterminate period of stability.” Although these statements are both true and absurd, they contain a kernel of wisdom: simply assuming a relative degree of stability and painting a picture of the future based on current trends is the first step of scenario planning. The trick, of course, is to never completely forget the “other shoe” of Paulos’s statement: as the disclaimer states on all investment offerings, “Past performance is not a guarantee of future results”; at some point in the future our present trends will no longer accurately describe where we are headed. (We will deal with this as well, with a few “safety valves.”)

From the second stage of the Rational Planning Paradigm (covered in the background sections of the book) we should have gathered information on both past and present circumstances related to our planning effort. If we are looking at housing production, we might have data on annual numbers of building permits and new subdivision approvals, mortgage rates, and housing prices; if we are looking at public transportation we might need monthly ridership numbers, information of fare changes, population and employment figures, and even data on past weather patterns or changes in vehicle ownership and gas prices. The first step of projection, therefore, is to gather relevant information and get it into a form that you can use.

Since we will be thinking about changes over time in order to project a trend into the future, we’ll need to make sure that our data has time as an element: a series of data points with one observation for each point or period of time is known as a time series. The exact units of time are not important—they could be days, months, years, decades, or something different—but it is customary (and important) to obtain data where points are regularly spaced at even intervals.1 Essentially, time series data is a special case of multivariate data in which we treat time itself as an additional variable and look for relationships as it changes. Luckily, R has some excellent functions and packages for dealing with time-series data, which we will cover below in passing. For starters, however, let’s consider a simple example, to start to think about what goes into projections. Continue reading…

Tags: , , , , ,

A Richer Neighborhood Profile, Part I: Getting tract-level data

Posted by Ezra Glenn on April 08, 2012
Census, Missions, Reconnaissance, Shape Your Neighborhood / No Comments

In a previous mission (see Finding Obama in the smallest Census geography) we delved down to the see what data was available at the level of individual blocks. Unfortunately, as we noted there, the Census doesn’t provide a whole lot of useful data at the block-level, since the results will exclude sample data from the SF3 “long form” (or, post-2000, the American Community Survey). If we want to know more about a neighborhood we will need to think in slightly larger geographies, and seek data at the tract-level or higher.

For this mission, we’ll be zooming into to Park Slope neighborhood on Brooklyn, and gathering data on income, race, education, and the breakdown of owners and renters for a single census tract. Since its often helpful to be able to view data like this in the context of the surrounding neighborhood, subsequent missions will explore ways to make comparisons with this sort of data, either to other tracts or to larger geographies.

But for starters, our target: although defining the exact edges of a neighborhood is never easy – especially ones in dense, diverse areas, where even residents disagree over terminology and the continual processes of gentrification, urban decline, migration, and other demographic shifts continually redefine the categories – most observers would agree that the neighborhood extends roughly north and west from Bartel Pritchard Square, at the lower corner of Prospect Park, with both 15th Street and Prospect Park itself providing something of an “edge.” Since edges are often exciting places to observe change, we will select an address along 15th Street, near the corner of 5th Avenue. Continue reading…

Tags: , , , ,

Building Blocks: Finding Obama in the smallest Census geography

Posted by Ezra Glenn on April 02, 2012
Missions, Reconnaissance, Shape Your Neighborhood / 1 Comment

The most basic unit of the U.S. Census is the individual household — that’s who fills out the surveys – but the Census won’t report data at the household level: in order to deliver on its promise of privacy and confidentiality (and thereby ensure our willingness to be enumerated), the Census always aggregates data before releasing it. This is important, and should become something of a mantra for would-be data analysts: all Census data is summary data. That said, we can still learn quite a lot at these micro-geographies, especially when we know what we are looking for.

Finding Barack

As an example of how to work with the building blocks of Census summary data – the individual “blocks” – let’s go back a bit in time and look at a very particular neighborhood in Chicago. At the time of the 2000 Census, President Obama was serving as a Senator from Illinois, living at 5429 S. Harper Avenue in Chicago. Starting with just an address, you can easily find how it fits into the census geography on the “American FactFinder” site: just visit the main Census site, click the menu-bar for Data, and select the link for American FactFinder.

Continue reading…

Tags: , , , , , ,

SYN on

Posted by Ezra Glenn on October 18, 2011
Self-promotion, Shape Your Neighborhood / No Comments

I was pleased to see my book, Shape Your Neighborhood: How to Use Public Data for Community Advocacy and Activism, listed for sale on  My guess is that they don’t know how subversive the book will be—including a section on this great visualization from Flowing Data on the spread of Walmart.  (Of course, how could they know about this section, since I haven’t even finished writing the book yet…)

Tags: , , ,