Building Blocks: Finding Obama in the smallest Census geography

Posted by Ezra Glenn on April 02, 2012
Missions, Reconnaissance, Shape Your Neighborhood

The most basic unit of the U.S. Census is the individual household — that’s who fills out the surveys – but the Census won’t report data at the household level: in order to deliver on its promise of privacy and confidentiality (and thereby ensure our willingness to be enumerated), the Census always aggregates data before releasing it. This is important, and should become something of a mantra for would-be data analysts: all Census data is summary data. That said, we can still learn quite a lot at these micro-geographies, especially when we know what we are looking for.

Finding Barack

As an example of how to work with the building blocks of Census summary data – the individual “blocks” – let’s go back a bit in time and look at a very particular neighborhood in Chicago. At the time of the 2000 Census, President Obama was serving as a Senator from Illinois, living at 5429 S. Harper Avenue in Chicago. Starting with just an address, you can easily find how it fits into the census geography on the “American FactFinder” site: just visit the main Census site, click the menu-bar for Data, and select the link for American FactFinder.

FactFinder has a few quirks, mostly related to its attempts to be overly helpful, which can sometimes make it challenging to find exactly what you are looking for. The key is to be clear in your own mind about three things:

  • When: what year(s) do you want to know about? This will determine which survey and dataset you select. For right now, we want to know about the year 2000.
  • Where: what geography (or geographies) do you want to know about? This will help you select the geographies from the millions of different choices. Importantly, you need to think both about exactly where (here, the Obama family address) and how large and area (here, the smallest possible).
  • What: This is usually best left for last. Once we’ve selected a data set and a collection of Census geographies, we can zero in on available data for our topic.

To begin, then, let’s limit our search to data from the 2000 Decennial Census. Do this by click Topics from the left-hand menu, expanding the tab for Year, and selecting 2000.1 Once you’ve done this, click Close, and we can find our address.

Just below Topics is a tab called Geographies. Click that, and the activate the tab labeled Address, which will open up a form to search for an address. Enter 5429 S. Harper Ave. Chicago, IL 60615 in the fields provided and click GO.

The screen will refresh and offer all known Census geographies containing this address, providing a nice example of the way these entities nest within each other: starting with the State of Illinois and proceeding down, we see that this address is a part of Cook County, as well as Census Tract 4108, and finally tiny little “Block 2000,” the smallest unit available. Other possible paths and end-points for this address can be found as well: zip codes, legislative districts, metropolitan areas, school districts, and so on. For now, lets stay small and look just at the Obama’s block: select the link that says “Block 2000, Block Group 2, Census Tract 4108, Cook County, Illinois”, which should then fly up and be added to your selections in the upper left.

Now that we’ve selected the geography we are interested in, click Close on the grey Select Geographies window and we can see what sort of data is available from the year 2000 for this block: 250 tables in all.

Sadly, if you try to select some of these tables – say, “Profile of General Demographic Characteristics: 2000″ or “Age Groups and Sex: 2000″ – you are likely to be told that “The data you requested are not available.” At this tiny geographic level, the Census often cannot report data without disclosing confidential information about individuals, and so the tables need to be withheld.

But rest assured, there is some data in there somewhere. For starters, let’s look at one of the most basic tables, known as “Table P001: Total Population.” If you search for this table (it’s easiest to just type P001 in the Search for: field), and then click the link to it, you will find yourself routed to perhaps the least interesting table in the entire census: the raw count According to this table, there were 237 people living in this census block in April, 2000. Even more noteworthy, you can see that once you arrive at actual results in American FactFinder, the navigation menus all change: now, instead of selecting different geographies and tables and adding them to your basket, you are presented with options for printing, saving, downloading, and even mapping the results of your query. We’ll return to these functions in a later mission; for now, though, please note one other button on the screen: the one that says Back to Search, in the upper left. When working in FactFinder, navigating via your browser’s own Back button is generally to be discouraged.

Beyond the Raw Count

Using the Back to Search button to return to the previous screen, you will note that all of your previous selections have been preserved. Since we searched so narrowly for Table P001, only one table is listed in the middle of the screen; to see more, we’ll need to remove this search item from our selections in the upper left by clicking the red circle with the X next to “p001″ under Your Selections, returning us to the full 250 tables with the potential for block-level data.

About halfway down the first page you will see a table called “QT-PL: Race, Hispanic or Latino, and Age: 2000″, from which we can learn a little more about the composition of the neighborhood; this particular table from the “Public Law 94-171″ series, is used to report of the fairness of legislative redistricting, and therefor mostly contains data on the racial breakdown of the people living there. Following the link, we see that in the year 2000 this block contained 237 people – same total as in P001, which is a good thing. We know that three of them were Barack Obama, his wife Michelle, and their daughter Malia. (Sasha wasn’t born until 2001, so she shouldn’t be in this count). From the table we can also see that of those 237 people, 92 were white alone, 108 were black alone, 21 were Asian alone, and the other 16 fell across a bunch of other categories. (In this context, “alone” means only one race, not that they were living alone.)

If we back up once again and search for Table P012, we can see a finer breakdown by age and sex, where we learn that the 237 people living in this block group are very evenly divided along gender lines: 118 males and 119 females. The table further breaks this down into age groupings: Barack Obama was 38 at the time of the census in April, 2000; if he filled it out correctly, he should be one of the 14 males listed in this age range.

Digging even deeper

You could also note that there are only three girls under five years of age, one of whom should be Malia. If we back up, drop P012 from our selections, and find Table P012B instead – “Sex by Age (Black alone)” – we can see that only one of the girls under 5 years is black.2 In theory, it appears that we may have been able to identify an individual in the data.3 Is this creepy? A little, but remember that the only way we were able to find a particular person like this (and we’re not really sure we found her—we’ve just found someone meeting the criteria we were looking for…) was because we already knew where to look; it’s a lot more like Where’s Waldo than it is like Enemy of the State.

http://eglenn.scripts.mit.edu/citystate/wp-content/uploads/2012/04/wpid-FactFinder_P012B.jpg

Next steps

Like some strange inverted version of mountain climbing, way down here at the block level of census geography the air gets a little thin: the only information available is what can be generated from the SF1 tables (the 100-percent count): raw counts, age, race, whether people own or rent their home, and a little other stuff about how people in a household are related. In order to find the really fun stuff from the Census “long form” or American Community Survey—income, ancestry, education, migration, commuting patterns, employment, and so on—we need to expand our geography a bit. We’ll do that next, in another mission.

Footnotes:

1 Note that we could also have expanded “Dataset” and selected one of the products from 2000, but for now let’s keep it open and see all available data from this year.

2 There is actually a great deal of controversy around the question of whether Barack and Michelle listed themselves and their daughter as “Black” or one of the mixed racial categories. Interestingly, if you look instead at Table P012G, which lists “Sex by Age (Two of More Races)”, you will see zero girls under 5 years of age.

3 Actually, in practice things are a bit more complicated than this overly-sensationalist example presents: in order to guarantee privacy when dealing with small geographies, the Census Bureau implements “data-swapping” procedures when they create these tables, randomly re-assigning individuals to neighboring blocks. So a “1″ in a table like this (or any other small number) may actually represent someone in an adjacent geography (and likewise, a “0″ may simply mean that a someone is there but is being counted somewhere else). Sorry Big Brother!

Tags: , , , , , ,

1 Comment to Building Blocks: Finding Obama in the smallest Census geography

  • Anon says:

    Your article, by way of the census, speaks the the powerful truth that it takes relatively little information to identify a person within a mass group.

    The Wall-Street Journal noted in 2010 that only 33 bits of information (bits in the digital, 1/0 sense, rather than “pieces”) would be required to identify a person among the world’s entire population. The article also examines narrowing by zip code, age, or gender.

    Arvind Narayanan at Stanford also runs a blog dedicated to this phenomenon which explores a lot of different research angles on the subject of identifying people.

    Lots of interesting and sometimes chilling research in this field.

Leave a Reply