Pitfalls of Working with Census Data

Posted by Ezra Glenn on July 27, 2012
Census, Missions, Reconnaissance, Shape Your Neighborhood

Previous missions have demonstrated a whole lot of things you can do with census data. Here are a few of the problems you can get yourself into.

Census Geography Pitfalls

  • Unequal tracts: Despite what you may think, not all Census tracts (and their composite block groups and blocks) are created equal. The Census Bureau tries to structure their geography so that all tracts will be approximately the same size (about 4,000 people), but in practice there is a pretty large range (between 1,500 and 8,000 people per tract). If you are looking at raw numbers (counts of any sort), be sure to think about the overall population—it’s the denominator you’ll need to put the figures in perspective; conversely, if you are looking at percentages, remember that a small percentage of a large tract could actually be more people than a large percentage of a very small one.
  • Overlapping districts: Unfortunately, although the formal “pyramid” of Census geography is well-structured—building from block to block group to tract and so on up—our political and cultural divisions are not always so straightforward: cities sometimes spread across county lines, metropolitan areas may even cross state lines, and legislative districts have become a gerrymandered mess that would drive any rational cartographer to drink. As a result, there may be times when Census geographers have been forced to choose between a strictly “nested” geography that ignores higher-order political elements, and one with intermediate levels that do not fit neatly within each other.
  • Confusing or ambiguous place names: Partially related to the previous point, and partially due to the general orneriness of the culture (or perhaps the species), there are often times when the same name will occur in multiple places in Census geography. The name “New York” refers to a state, a metro region, a city, a county (strangely, one that is smaller than its city), and even an avenue in Atlantic City (or on the Monopoly Board). Luckily, once you get down to the level of census tracts and below, you enter the realm of pure-and-orderly numbers, and can largely avoid this trap—they are even sometimes referred to as “logical record numbers,” or LOGRECNO—although it’s a lot less fun to say “AR census tract 9803 block group 3” when you could be saying “Goobertown, Arkansas”.
  • Changing boundaries: Occasionally the Census Bureau needs to redraw the lines for some particular location—perhaps a city has annexed new land, or a large county has been split by an act of the state legislature. In these situations, you may see a sharp rise (or drop) in the counts from one Census to the next. For example, according to the 2000 census, the city of Bigfork, Montana had 1,421 people; in the 2010 census, this figure had grown to 4,270—a seeming tripling of the population. However, upon closer scrutiny, it turns out that most of this increase was the result of a change in census boundaries. (These situations may also exacerbate some of the previous problems.)

Additional Census Pitfalls

Here are some common mistakes to avoid that are not related to geography, but may still trip you up:

  • Hispanic includes all races: Unlike the terms White, Black, and Asian, the Census uses the word Hispanic to refer to ethnicity, not race (which is a pretty poorly defined category in its own right). For this reason, individuals counted as “Hispanic” will also be counted in one of the other racial categories. As a result, if you simply add up all the people in all these groups, you will be double-counting some. Sometimes the responses are carefully presented in categories such as “White, non Hispanic,” and so on, to avoid this confusion.
  • Families vs. households: According to the Census definitions, a family is “a group of two people or more related by birth, marriage, or adoption and residing together”; in contrast, a household “consists of all the people who occupy a housing unit,” whether related or not, including “lodgers, foster children, wards, or employees who share the housing unit.” A person living alone in a housing unit, or a group of unrelated people sharing a housing unit are counted as a household, but not a family. Often people use the terms interchangeably, but there is a difference; it’s best to figure out which one matters to you and stick with it, especially when making comparisons between places or across time.
  • Download overload: As mentioned elsewhere, the Census Bureau is a model of government transparency and open-access data. Beyond some understandable limitations of personal and sensitive data, pretty much everything they collect is available in one form or another (or really, hundreds of different forms) from their American FactFinder download site. However, just because it’s there doesn’t mean you need to download it. Today’s cheap memory and fast internet tend to encourage data users to download far more than they mean—for example, asking for data at the block or blockgroup level when a higher level of geographical summary would be fine, or (conversely) downloading data on the entire state when you really only near a specific region. Bigger datasets take longer for computers to work with, but even more importantly, they will confuse you. Treat free data like free beer and download responsibly.

Tags: , , ,

Leave a Reply