We’re pleased to announce the creation of a new mailing list for the acs.R package. The “acs” package allows users to download, manipulate, analyze, and visualize data from the American Community Survey in R; the “acs-r” e-mail list allows members to keep in touch and share information about the package, including updates from the development team concerning improvements, user questions and help requests, worked examples, and more. To register, visit http://mailman.mit.edu/mailman/listinfo/acs-r.
A very nice user wrote the following in an email to me about the latest version of the acs.R package:
> Thanks for providing such a wonderful package in R. I'm having > difficulty defining a geo at the block group level. Would you mind > sharing an example with me?
I responded via email, but thought that my answer — which took the form of a short worked-example — might be helpful to others, so I am posting it here as well. Here’s what I said:
To showcase how the package can create new census geographies based on stuff like blockgroups, let’s look in my home state of Massachusetts, in Middlesex County. If I wanted to get info on all the block groups for tract 387201,1 I could create a new geo like this:
> my.tract=geo.make(state="MA", county="Middlesex", tract=387201, block.group="*", check=T) Testing geography item 1: Tract 387201, Blockgroup *, Middlesex County, Massachusetts .... OK. >
(This might be a useful first step, especially if I didn’t know how many block groups there were in the tract, or what they were called. Also, note that check=T is not required, but can often help ensure you are dealing with valid geos.)
If I then wanted to get very basic info on these block groups – say, table number B01003 (Total Population), I could type:
> total.pop=acs.fetch(geo=my.tract, table.number="B01003") > total.pop ACS DATA: 2007 -- 2011 ; Estimates w/90% confidence intervals; for different intervals, see confint() B01003_001 Block Group 1 2681 +/- 319 Block Group 2 952 +/- 213 Block Group 3 1010 +/- 156 Block Group 4 938 +/- 214 >
Here we can see that the block.group=”*” has yielded the actual four block groups for the tract.
Now, if instead of wanting all of them, we only wanted the first two, we could just type:
> my.bgs=geo.make(state="MA", county="Middlesex", tract=387201, block.group=1:2, check=T) Testing geography item 1: Tract 387201, Blockgroup 1, Middlesex County, Massachusetts .... OK. Testing geography item 2: Tract 387201, Blockgroup 2, Middlesex County, Massachusetts .... OK. >
> bg.total.pop=acs.fetch(geo=my.bgs, table.number="B01003") > bg.total.pop ACS DATA: 2007 -- 2011 ; Estimates w/90% confidence intervals; for different intervals, see confint() B01003_001 Block Group 1 2681 +/- 319 Block Group 2 952 +/- 213 >
Now, if we wanted to add in some blockgroups from tract 387100 (a.k.a. “tract 3871″ — but remember: we need those trailing zeroes) – say, blockgroups 2 and 3 – we could enter:
> my.bgs=my.bgs+geo.make(state="MA", county="Middlesex", tract=387100, block.group=2:3, check=T) Testing geography item 1: Tract 387100, Blockgroup 2, Middlesex County, Massachusetts .... OK. Testing geography item 2: Tract 387100, Blockgroup 3, Middlesex County, Massachusetts .... OK.
> new.total.pop=acs.fetch(geo=my.bgs, table.number="B01003") > new.total.pop ACS DATA: 2007 -- 2011 ; Estimates w/90% confidence intervals; for different intervals, see confint() B01003_001 Block Group 1 2681 +/- 319 Block Group 2 952 +/- 213 Block Group 2 827 +/- 171 Block Group 3 1821 +/- 236 >
Note that the short rownames can be confusing – as in this example — but if you type:
> geography(new.total.pop) NAME state county tract blockgroup 1 Block Group 1 25 17 387201 1 2 Block Group 2 25 17 387201 2 3 Block Group 2 25 17 387100 2 4 Block Group 3 25 17 387100 3 >
you can see that the two entries for “Block Group 2″ are actually in different tracts. (Also note: you can combine block groups and other levels of geography, all in a single geo objects…)
And now, to show off the coolest part! Let’s say I don’t just want to get data on the four blockgroups, but I want to combine them into a single new geographic entity. Before downloading, I could simply say:
> combine(my.bgs)=T > combine.term(my.bgs)="Select Blockgroups" > new.total.pop=acs.fetch(geo=my.bgs, table.number="B01003") > new.total.pop ACS DATA: 2007 -- 2011 ; Estimates w/90% confidence intervals; for different intervals, see confint() B01003_001 Select Blockgroups 6281 +/- 481.733328720362 >
And see – voila! – it sums the estimates and deals with the margins of error, so you don’t need to get your hands dirty with square roots and standard errors and all that messy stuff.
You can even create interesting nested geo.sets, where some of the lower levels are combined, like this:
> combine.term(my.bgs)="Select Blockgroups, Tracts 387100 and 387201" > more.bgs=c(my.bgs, geo.make(state="MA", county="Middlesex", tract=370300, block.group=1:2, check=T), geo.make(state="MA", county="Middlesex", tract=370400, block.group=1:3, combine=T, combine.term="Select Blockgroups, Tract 3703", check=T)) Testing geography item 1: Tract 370300, Blockgroup 1, Middlesex County, Massachusetts .... OK. Testing geography item 2: Tract 370300, Blockgroup 2, Middlesex County, Massachusetts .... OK. Testing geography item 1: Tract 370400, Blockgroup 1, Middlesex County, Massachusetts .... OK. Testing geography item 2: Tract 370400, Blockgroup 2, Middlesex County, Massachusetts .... OK. Testing geography item 3: Tract 370400, Blockgroup 3, Middlesex County, Massachusetts .... OK. > more.total.pop=acs.fetch(geo=more.bgs, table.number="B01003") > more.total.pop ACS DATA: 2007 -- 2011 ; Estimates w/90% confidence intervals; for different intervals, see confint() B01003_001 Select Blockgroups, Tracts 387100 and 387201 6281 +/- 481.733328720362 Block Group 1 315 +/- 132 Block Group 2 1460 +/- 358 Select Blockgroups, Tract 3703 2594 +/- 487.719181496894 >
In closing: I hope this helps, and be sure to contact me if you have other questions/problems about using the package.
1 Note that tracts are often referred to in a strange “four-digit+decimal extension” shorthand, so “tract 387201″ may be also known as “tract 3872.01″. When working with this package, be careful and always use six-digit tract numbers in this package without the decimal point. If the tract number seems to only be four digits long, add two extra “trailing” zeroes at the end.
It’s been a while since I last updated the acs.R package, but as noted here, I’ll be using CityState to provide updates and test-versions of the package prior to uploading to CRAN. I’m happy to report that we now have a near-final package of version 1.0.
The most significant improvements to the package (beyond those mentioned previously) are the following;
- The package is now capable of downloading data directly from the new Census American Community Survey API and importing into R (with proper statistical treatment of estimates and error, variable and geographic relabeling, and more), all through a single “acs.fetch()” function;
- The package includes a new “geo.make()” function to allow users to create their own custom geographies for organize and download data; and
- The package provides two special “lookup” tools to help filter through all the existing Census geographies (with the “geo.lookup()” function) and tables (with the “acs.lookup()” function) to find exactly what they want. These functions return new R “lookup” objects which can be saved, manipulated, and passed to acs.fetch() for downloading data.
I want to thank the very kind folks at the Puget Sound Regional Council, who have been supporting the development of this package (in exchange for some special attention to scripts and functions they really want to include for themselves and their member communities). They continue to provide excellent help and advice “from the trenches” as we refine the package.
If you’re interested in trying out the new version, you can download it below, along with a brief set of “Introductory Notes” written for the team at PSRC. (Users may also want to check out the manual for the previous version of the package and this article from 2011 on the package.)
In order to keep posts in this blog focused the core mission (as stated above: “cities, planning, data, communities, participation”), I’ve decided to move film reviews to the new UrbanFilm site, home of Urban//Planning Film Review. So:
Census, Missions, Reconnaissance, Shape Your Neighborhood / No Comments
Previous missions have demonstrated a whole lot of things you can do with census data. Here are a few of the problems you can get yourself into.
Census Geography Pitfalls
- Unequal tracts: Despite what you may think, not all Census tracts (and their composite block groups and blocks) are created equal. The Census Bureau tries to structure their geography so that all tracts will be approximately the same size (about 4,000 people), but in practice there is a pretty large range (between 1,500 and 8,000 people per tract). If you are looking at raw numbers (counts of any sort), be sure to think about the overall population—it’s the denominator you’ll need to put the figures in perspective; conversely, if you are looking at percentages, remember that a small percentage of a large tract could actually be more people than a large percentage of a very small one.
- Overlapping districts: Unfortunately, although the formal “pyramid” of Census geography is well-structured—building from block to block group to tract and so on up—our political and cultural divisions are not always so straightforward: cities sometimes spread across county lines, metropolitan areas may even cross state lines, and legislative districts have become a gerrymandered mess that would drive any rational cartographer to drink. As a result, there may be times when Census geographers have been forced to choose between a strictly “nested” geography that ignores higher-order political elements, and one with intermediate levels that do not fit neatly within each other.
- Confusing or ambiguous place names: Partially related to the previous point, and partially due to the general orneriness of the culture (or perhaps the species), there are often times when the same name will occur in multiple places in Census geography. The name “New York” refers to a state, a metro region, a city, a county (strangely, one that is smaller than its city), and even an avenue in Atlantic City (or on the Monopoly Board). Luckily, once you get down to the level of census tracts and below, you enter the realm of pure-and-orderly numbers, and can largely avoid this trap—they are even sometimes referred to as “logical record numbers,” or LOGRECNO—although it’s a lot less fun to say “AR census tract 9803 block group 3″ when you could be saying “Goobertown, Arkansas”.
- Changing boundaries: Occasionally the Census Bureau needs to redraw the lines for some particular location—perhaps a city has annexed new land, or a large county has been split by an act of the state legislature. In these situations, you may see a sharp rise (or drop) in the counts from one Census to the next. For example, according to the 2000 census, the city of Bigfork, Montana had 1,421 people; in the 2010 census, this figure had grown to 4,270—a seeming tripling of the population. However, upon closer scrutiny, it turns out that most of this increase was the result of a change in census boundaries. (These situations may also exacerbate some of the previous problems.)
Missions, Reconnaissance, Shape Your Neighborhood / No Comments
Many of the “reconnaissance” missions found here either explore some existing public database (such as the Census, via FactFinder), or demonstrate innovative tools to gather new data for your planning efforts. Sometimes, however, you will be interested in making use of information contained within City Hall or some other public agency, but it is not entirely clear how to gain access it. When access to information is the problem, it is important to understand the notion of public records and the proper application of the “Federal Freedom of Information Act” (and any parallel state legislation).
What the law says
President Lyndon B. Johnson signed the Freedom of Information Act (known as FOIA) into law — appropriately, albeit somewhat ironically, given the players involved — on July 4, 1966, thereby establishing a consistent process under which citizens could enforce their right to access government records. Importantly — as legal scholars and civil rights advocates will be quick to note — the right itself existed prior to the Act: in a democratic society, public records are already public, and we have every right to view them, copy them, analyze and distribute them, and otherwise be as nosy as we want to be. Remember: it’s government of the people by the people, and we shouldn’t be shy about asking what it’s up to even when it operates behind closed doors.
Over the past five decades, FOIA requests under the act have become a standard step in investigative journalism, citizen government accountability movements, and even high-profile lawsuits. Nationwide, more than 500,000 requests are made each year, resulting (eventually — see below) in one of the most abundant sources of information about an astonishing array of topics, from arrest records to political campaign spending, from tax assessments to high-school drop-out rates, and so on.1
In World on a Wire (reviewed previously), Rainer Werner Fassbinder explored the possibility of creating a miniature world through the use of a computer. In Hurdy Gurdy, a wonderful new short film from a German and Estonian collaboration, we get to enjoy the ways that the camera itself can render our real-world in apparent miniature (although I suspect a computer played a part as well…), giving us an entirely new and delightfully playful perspective on everyday scenes of urban life.
The film — all of four minutes long — uses stop-motion photography along with a technique that either is, or perhaps simulates, what is known as “tilt-shift” photography. The images below give a rough sense of the effect, which is to change the depth of focus and the level of detail; when combined with the increased speed and mechanical jerkiness (due to the stop-motion animation), the film transforms footage of a typical sea-side town into a magical micropolis of urban interaction: a true sidewalk ballet which unfolds as tourists arrive, streetcars come and go, crowds surge and flow, and daily life weaves and cycles in an endless state of humming activity. (The title itself refers to the mechanical music box, where one could just wind it up again and have the whole scene-and-song play over again and again.)
A number of other short videos using the tilt-shift technique can be found on-line, and quite a few choose city scenes or the movement of crowds to show off the magic; for example, see this popular short depicting a day in the life at Disney or this one showing streetlife in New York City. But it would be wrong to regard Hurdy Gurdy as nothing more than a cool demonstration of a visual trick: rather than letting the technique be the whole story, Seideneder and Pfeiffer use the effect to focus us on the beauty, color, and harmony of our ever-changing world. In a surprising way, even as we watch the film and smile and wonder and are entertained and entranced by this non-stop motion, we discover the time and space to meditate on the smallness of our individual existence and the majesty of the patterns we collectively create.
The film was screened in Somerville, MA, as part of the 2012 Independent Film Festival of Boston, and it seems to be making the rounds of similar festivals here and abroad, including Cleveland, Woodstock, Florida, Lisbon, Rotterdam, and Cannes. Look for it wherever independent films are found.
Data, Missions, Shape Your Neighborhood, Simulation / No Comments
In addition to the general caution against using past data for projecting future conditions (and the need for equally spaced time intervals mentioned above), the particulars of time series data require additional attention to some special issues.
Inflation and Constant Dollars
Any time series that deals with dollars (or yen, pounds sterling, wampum, or other forms of currency) must confront the fact that the value of money changes over time. If you are simply making a time series showing the shrinking value of the dollar, that’s fine — it’s what you want to show — but if you want to show something else (say, changes in wages or home prices), then you will need to correct your data to some common base. Usually this is done by starting with a base year (often the start or end of the series, or the “current” year) and adjusting values based on changes to some official inflation statistic (e.g., the consumer price index).1
Growth and Change to the Underlying Population
Over time — especially over long periods — the population of a place can change quite a lot, both in terms of overall numbers and the demographic components. As with inflation, this may be precisely the change that you are interested in observing and predicting (as in the first examples in this chapter), but at times it can introduce a spurious or intervening variable into your analysis.
Data, Missions, Shape Your Neighborhood, Simulation / No Comments
In our last mission we used
R to plot a trend-line for population growth in Houston, based on historical data from the past century. Depending on which of two different methods we used, we arrived at an estimate for the city’s 2010 population of 2,144,531 (based on the 100-year growth trend for the city) or 2,225,125 (based on the steeper growth trend of the past fifty years). Looking now at the official Census count for 2010, it turns out that our guesses are close, but both of too high: the actual reported figure for 2010 is 2,099,451.
It would have been surprising to have guessed perfectly based on nothing other than a linear trend — and the fact that we came as close as we did speaks well of this sort of “back of the envelope” projection technique (at least for the case of steady-growth). But there was a lot of information contained in those data points that we essentially ignored: our two trendlines were really based on nothing more than a start and an end point.
A more sophisticated set of tools for making projections — which may be able to extract some extra meaning from the variation contained in the data — is provided in
R by the excellent
forecast package, developed by Rob Hyndman of the Monash University in Australia. To access these added functions, you’ll need to install it:
> install.packages(forecast) > library(forecast)
R: an object with
R is perfectly happy to help you analyze and plot time series data organized in vectors and dataframes, it actually has a specialized object class for this sort of thing, created with the
ts() function. Remember:
R is an “object-oriented” language. Every object (a variable, a dataframe, a function, a time series) is associated with a certain class, which helps the language figure out how to manage and interact with them. To find the class of an object, use the
> a=c(1,2) > class(a)  "numeric" > a=TRUE > class(a)  "logical" > class(plot)  "function" > a=ts(1) > class(a)  "ts" >
Data, Missions, Shape Your Neighborhood, Simulation / No Comments
In our previous mission we plotted population numbers in Houston for 1900–2000, to start to understand the growth trend for that city. Now, what if we didn’t have access to the latest Census figures, and we wanted to try to guess Houston’s population for 2010, using nothing but the data from 1900–2000?
One place to start would be with the 2000 population (1,953,631) and adjust it a bit based on historical trends. With 100 year’s worth of data, we can do this in
R with a simple call to some vector math.1
> attach(houston.pop) # optional, see footnote > population # don't forget: 11, not 10, data points  1953631 > annual.increase=(population-population)/100 # watch the parentheses! > population+10*annual.increase  2144531 >
Remember that we actually have eleven data points, since we have both 1900 and 2000, so we need to specify
population as our endpoint. But since there are only ten decade intervals, we divide by 100 to get the annual increase. Adding ten times this increase to the 2000 population, we get an estimate for 2010 of 2,144,531. (Bonus question: based on this estimated annual increase, in what year would Houston have passed the two-million mark?2)