acs-r Mailing List: keep in the loop

Posted by Ezra Glenn on April 24, 2013
Census, Code, Self-promotion / No Comments

We’re pleased to announce the creation of a new mailing list for the acs.R package. The “acs” package allows users to download, manipulate, analyze, and visualize data from the American Community Survey in R; the “acs-r” e-mail list allows members to keep in touch and share information about the package, including updates from the development team concerning improvements, user questions and help requests, worked examples, and more. To register, visit http://mailman.mit.edu/mailman/listinfo/acs-r.

Tags: , , , , ,

acs.R: a worked example using blockgroup-level data

Posted by Ezra Glenn on March 11, 2013
Census, Code / 2 Comments

A very nice user wrote the following in an email to me about the latest version of the acs.R package:

 
> Thanks for providing such a wonderful package in R. I'm having
> difficulty defining a geo at the block group level. Would you mind
> sharing an example with me?

I responded via email, but thought that my answer — which took the form of a short worked-example — might be helpful to others, so I am posting it here as well. Here’s what I said:

To showcase how the package can create new census geographies based on stuff like blockgroups, let’s look in my home state of Massachusetts, in Middlesex County. If I wanted to get info on all the block groups for tract 387201,1 I could create a new geo like this:

> my.tract=geo.make(state="MA", county="Middlesex", 
  tract=387201, block.group="*", check=T)
Testing geography item 1: Tract 387201, Blockgroup *, 
  Middlesex County, Massachusetts .... OK.
> 

(This might be a useful first step, especially if I didn’t know how many block groups there were in the tract, or what they were called. Also, note that check=T is not required, but can often help ensure you are dealing with valid geos.)

If I then wanted to get very basic info on these block groups – say, table number B01003 (Total Population), I could type:

> total.pop=acs.fetch(geo=my.tract, table.number="B01003")
> total.pop
ACS DATA: 
 2007 -- 2011 ;
  Estimates w/90% confidence intervals;
  for different intervals, see confint()
              B01003_001  
Block Group 1 2681 +/- 319
Block Group 2 952 +/- 213 
Block Group 3 1010 +/- 156
Block Group 4 938 +/- 214 
> 

Here we can see that the block.group=”*” has yielded the actual four block groups for the tract.

Now, if instead of wanting all of them, we only wanted the first two, we could just type:

> my.bgs=geo.make(state="MA", county="Middlesex", 
  tract=387201, block.group=1:2, check=T)
Testing geography item 1: Tract 387201, Blockgroup 1, 
  Middlesex County, Massachusetts .... OK.
Testing geography item 2: Tract 387201, Blockgroup 2, 
  Middlesex County, Massachusetts .... OK.
> 

And then:

> bg.total.pop=acs.fetch(geo=my.bgs, table.number="B01003")
> bg.total.pop
ACS DATA: 
 2007 -- 2011 ;
  Estimates w/90% confidence intervals;
  for different intervals, see confint()
              B01003_001  
Block Group 1 2681 +/- 319
Block Group 2 952 +/- 213 
> 

Now, if we wanted to add in some blockgroups from tract 387100 (a.k.a. “tract 3871″ — but remember: we need those trailing zeroes) – say, blockgroups 2 and 3 – we could enter:

> my.bgs=my.bgs+geo.make(state="MA", county="Middlesex", 
  tract=387100, block.group=2:3, check=T)
Testing geography item 1: Tract 387100, Blockgroup 2, 
  Middlesex County, Massachusetts .... OK.
Testing geography item 2: Tract 387100, Blockgroup 3, 
  Middlesex County, Massachusetts .... OK.

And then:

> new.total.pop=acs.fetch(geo=my.bgs, table.number="B01003")
> new.total.pop
ACS DATA: 
 2007 -- 2011 ;
  Estimates w/90% confidence intervals;
  for different intervals, see confint()
              B01003_001  
Block Group 1 2681 +/- 319
Block Group 2 952 +/- 213 
Block Group 2 827 +/- 171 
Block Group 3 1821 +/- 236
> 

Note that the short rownames can be confusing – as in this example — but if you type:

> geography(new.total.pop)
           NAME state county  tract blockgroup
1 Block Group 1    25     17 387201          1
2 Block Group 2    25     17 387201          2
3 Block Group 2    25     17 387100          2
4 Block Group 3    25     17 387100          3
> 

you can see that the two entries for “Block Group 2″ are actually in different tracts. (Also note: you can combine block groups and other levels of geography, all in a single geo objects…)

And now, to show off the coolest part! Let’s say I don’t just want to get data on the four blockgroups, but I want to combine them into a single new geographic entity. Before downloading, I could simply say:

> combine(my.bgs)=T
> combine.term(my.bgs)="Select Blockgroups"
> new.total.pop=acs.fetch(geo=my.bgs, table.number="B01003")
> new.total.pop
ACS DATA: 
 2007 -- 2011 ;
  Estimates w/90% confidence intervals;
  for different intervals, see confint()
                   B01003_001               
Select Blockgroups 6281 +/- 481.733328720362
>

And see – voila! – it sums the estimates and deals with the margins of error, so you don’t need to get your hands dirty with square roots and standard errors and all that messy stuff.

You can even create interesting nested geo.sets, where some of the lower levels are combined, like this:

> combine.term(my.bgs)="Select Blockgroups, 
  Tracts 387100 and 387201"
> more.bgs=c(my.bgs, geo.make(state="MA", 
  county="Middlesex", tract=370300, block.group=1:2, check=T), 
  geo.make(state="MA", county="Middlesex", tract=370400, 
  block.group=1:3, combine=T, combine.term="Select Blockgroups, 
  Tract 3703", check=T)) 
Testing geography item 1: Tract 370300, Blockgroup 1, 
  Middlesex County, Massachusetts .... OK.
Testing geography item 2: Tract 370300, Blockgroup 2, 
  Middlesex County, Massachusetts .... OK.
Testing geography item 1: Tract 370400, Blockgroup 1, 
  Middlesex County, Massachusetts .... OK.
Testing geography item 2: Tract 370400, Blockgroup 2, 
  Middlesex County, Massachusetts .... OK.
Testing geography item 3: Tract 370400, Blockgroup 3, 
 Middlesex County, Massachusetts .... OK.
> more.total.pop=acs.fetch(geo=more.bgs, table.number="B01003")
> more.total.pop
ACS DATA: 
 2007 -- 2011 ;
  Estimates w/90% confidence intervals;
  for different intervals, see confint()
                                             B01003_001               
Select Blockgroups, Tracts 387100 and 387201 6281 +/- 481.733328720362
Block Group 1                                315 +/- 132              
Block Group 2                                1460 +/- 358             
Select Blockgroups, Tract 3703               2594 +/- 487.719181496894
> 

In closing: I hope this helps, and be sure to contact me if you have other questions/problems about using the package.

Footnotes:

1 Note that tracts are often referred to in a strange “four-digit+decimal extension” shorthand, so “tract 387201″ may be also known as “tract 3872.01″. When working with this package, be careful and always use six-digit tract numbers in this package without the decimal point. If the tract number seems to only be four digits long, add two extra “trailing” zeroes at the end.

Tags: , , , ,

Major improvements to acs.R: sneak peak at version 1.0

Posted by Ezra Glenn on February 05, 2013
Census, Code / 4 Comments

It’s been a while since I last updated the acs.R package, but as noted here, I’ll be using CityState to provide updates and test-versions of the package prior to uploading to CRAN. I’m happy to report that we now have a near-final package of version 1.0.

The most significant improvements to the package (beyond those mentioned previously) are the following;

  • The package is now capable of downloading data directly from the new Census American Community Survey API and importing into R (with proper statistical treatment of estimates and error, variable and geographic relabeling, and more), all through a single “acs.fetch()” function;
  • The package includes a new “geo.make()” function to allow users to create their own custom geographies for organize and download data; and
  • The package provides two special “lookup” tools to help filter through all the existing Census geographies (with the “geo.lookup()” function) and tables (with the “acs.lookup()” function) to find exactly what they want. These functions return new R “lookup” objects which can be saved, manipulated, and passed to acs.fetch() for downloading data.

I want to thank the very kind folks at the Puget Sound Regional Council, who have been supporting the development of this package (in exchange for some special attention to scripts and functions they really want to include for themselves and their member communities). They continue to provide excellent help and advice “from the trenches” as we refine the package.

If you’re interested in trying out the new version, you can download it below, along with a brief set of “Introductory Notes” written for the team at PSRC. (Users may also want to check out the manual for the previous version of the package and this article from 2011 on the package.)

Tags: , , , ,

Film Reviews Migrating –> UrbanFilms

Posted by Ezra Glenn on August 17, 2012
Film / No Comments

In order to keep posts in this blog focused the core mission (as stated above: “cities, planning, data, communities, participation”), I’ve decided to move film reviews to the new UrbanFilm site, home of Urban//Planning Film Review. So:

  • updates on planning hacks, census data, Shape Your Neighborhood how-to missions, and urban planning practice: find them here at CityState.
  • reviews, interviews, opinion, and news related to films about cities and urban planning: find them at UrbanFilm.

Pitfalls of Working with Census Data

Posted by Ezra Glenn on July 27, 2012
Census, Missions, Reconnaissance, Shape Your Neighborhood / No Comments

Previous missions have demonstrated a whole lot of things you can do with census data. Here are a few of the problems you can get yourself into.

Census Geography Pitfalls

  • Unequal tracts: Despite what you may think, not all Census tracts (and their composite block groups and blocks) are created equal. The Census Bureau tries to structure their geography so that all tracts will be approximately the same size (about 4,000 people), but in practice there is a pretty large range (between 1,500 and 8,000 people per tract). If you are looking at raw numbers (counts of any sort), be sure to think about the overall population—it’s the denominator you’ll need to put the figures in perspective; conversely, if you are looking at percentages, remember that a small percentage of a large tract could actually be more people than a large percentage of a very small one.
  • Overlapping districts: Unfortunately, although the formal “pyramid” of Census geography is well-structured—building from block to block group to tract and so on up—our political and cultural divisions are not always so straightforward: cities sometimes spread across county lines, metropolitan areas may even cross state lines, and legislative districts have become a gerrymandered mess that would drive any rational cartographer to drink. As a result, there may be times when Census geographers have been forced to choose between a strictly “nested” geography that ignores higher-order political elements, and one with intermediate levels that do not fit neatly within each other.
  • Confusing or ambiguous place names: Partially related to the previous point, and partially due to the general orneriness of the culture (or perhaps the species), there are often times when the same name will occur in multiple places in Census geography. The name “New York” refers to a state, a metro region, a city, a county (strangely, one that is smaller than its city), and even an avenue in Atlantic City (or on the Monopoly Board). Luckily, once you get down to the level of census tracts and below, you enter the realm of pure-and-orderly numbers, and can largely avoid this trap—they are even sometimes referred to as “logical record numbers,” or LOGRECNO—although it’s a lot less fun to say “AR census tract 9803 block group 3″ when you could be saying “Goobertown, Arkansas”.
  • Changing boundaries: Occasionally the Census Bureau needs to redraw the lines for some particular location—perhaps a city has annexed new land, or a large county has been split by an act of the state legislature. In these situations, you may see a sharp rise (or drop) in the counts from one Census to the next. For example, according to the 2000 census, the city of Bigfork, Montana had 1,421 people; in the 2010 census, this figure had grown to 4,270—a seeming tripling of the population. However, upon closer scrutiny, it turns out that most of this increase was the result of a change in census boundaries. (These situations may also exacerbate some of the previous problems.)

Tags: , , ,

Freedom of Information: Public Records Requests

Posted by Ezra Glenn on July 20, 2012
Missions, Reconnaissance, Shape Your Neighborhood / No Comments

Many of the “reconnaissance” missions found here either explore some existing public database (such as the Census, via FactFinder), or demonstrate innovative tools to gather new data for your planning efforts. Sometimes, however, you will be interested in making use of information contained within City Hall or some other public agency, but it is not entirely clear how to gain access it. When access to information is the problem, it is important to understand the notion of public records and the proper application of the “Federal Freedom of Information Act” (and any parallel state legislation).

What the law says

President Lyndon B. Johnson signed the Freedom of Information Act (known as FOIA) into law — appropriately, albeit somewhat ironically, given the players involved — on July 4, 1966, thereby establishing a consistent process under which citizens could enforce their right to access government records. Importantly — as legal scholars and civil rights advocates will be quick to note — the right itself existed prior to the Act: in a democratic society, public records are already public, and we have every right to view them, copy them, analyze and distribute them, and otherwise be as nosy as we want to be. Remember: it’s government of the people by the people, and we shouldn’t be shy about asking what it’s up to even when it operates behind closed doors.

Over the past five decades, FOIA requests under the act have become a standard step in investigative journalism, citizen government accountability movements, and even high-profile lawsuits. Nationwide, more than 500,000 requests are made each year, resulting (eventually — see below) in one of the most abundant sources of information about an astonishing array of topics, from arrest records to political campaign spending, from tax assessments to high-school drop-out rates, and so on.1

Continue reading…

Tags: , ,

Hurdy Gurdy (Daniel Seideneder and Daniel Pfeiffer, 2011)

Posted by Ezra Glenn on April 29, 2012
Film / 2 Comments

In World on a Wire (reviewed previously), Rainer Werner Fassbinder explored the possibility of creating a miniature world through the use of a computer. In Hurdy Gurdy, a wonderful new short film from a German and Estonian collaboration, we get to enjoy the ways that the camera itself can render our real-world in apparent miniature (although I suspect a computer played a part as well…), giving us an entirely new and delightfully playful perspective on everyday scenes of urban life.

The film — all of four minutes long — uses stop-motion photography along with a technique that either is, or perhaps simulates, what is known as “tilt-shift” photography. The images below give a rough sense of the effect, which is to change the depth of focus and the level of detail; when combined with the increased speed and mechanical jerkiness (due to the stop-motion animation), the film transforms footage of a typical sea-side town into a magical micropolis of urban interaction: a true sidewalk ballet which unfolds as tourists arrive, streetcars come and go, crowds surge and flow, and daily life weaves and cycles in an endless state of humming activity. (The title itself refers to the mechanical music box, where one could just wind it up again and have the whole scene-and-song play over again and again.)

http://www.floridafilmfestival.com/images/uploads/cache/HurdyGurdy1-434x250.jpg

http://www.floridafilmfestival.com/images/uploads/cache/HurdyGurdy_300_print_crop-434x250.jpg

A number of other short videos using the tilt-shift technique can be found on-line, and quite a few choose city scenes or the movement of crowds to show off the magic; for example, see this popular short depicting a day in the life at Disney or this one showing streetlife in New York City. But it would be wrong to regard Hurdy Gurdy as nothing more than a cool demonstration of a visual trick: rather than letting the technique be the whole story, Seideneder and Pfeiffer use the effect to focus us on the beauty, color, and harmony of our ever-changing world. In a surprising way, even as we watch the film and smile and wonder and are entertained and entranced by this non-stop motion, we discover the time and space to meditate on the smallness of our individual existence and the majesty of the patterns we collectively create.

The film was screened in Somerville, MA, as part of the 2012 Independent Film Festival of Boston, and it seems to be making the rounds of similar festivals here and abroad, including Cleveland, Woodstock, Florida, Lisbon, Rotterdam, and Cannes. Look for it wherever independent films are found.

Tags: , , , ,

Pitfalls of Working with Time-Series Data

Posted by Ezra Glenn on April 24, 2012
Data, Missions, Shape Your Neighborhood, Simulation / No Comments

In addition to the general caution against using past data for projecting future conditions (and the need for equally spaced time intervals mentioned above), the particulars of time series data require additional attention to some special issues.

Inflation and Constant Dollars

Any time series that deals with dollars (or yen, pounds sterling, wampum, or other forms of currency) must confront the fact that the value of money changes over time. If you are simply making a time series showing the shrinking value of the dollar, that’s fine — it’s what you want to show — but if you want to show something else (say, changes in wages or home prices), then you will need to correct your data to some common base. Usually this is done by starting with a base year (often the start or end of the series, or the “current” year) and adjusting values based on changes to some official inflation statistic (e.g., the consumer price index).1

Growth and Change to the Underlying Population

Over time — especially over long periods — the population of a place can change quite a lot, both in terms of overall numbers and the demographic components. As with inflation, this may be precisely the change that you are interested in observing and predicting (as in the first examples in this chapter), but at times it can introduce a spurious or intervening variable into your analysis.

Continue reading…

Tags: , , , , ,

Examining Historical Growth III: The forecast() package

Posted by Ezra Glenn on April 21, 2012
Data, Missions, Shape Your Neighborhood, Simulation / No Comments

In our last mission we used R to plot a trend-line for population growth in Houston, based on historical data from the past century. Depending on which of two different methods we used, we arrived at an estimate for the city’s 2010 population of 2,144,531 (based on the 100-year growth trend for the city) or 2,225,125 (based on the steeper growth trend of the past fifty years). Looking now at the official Census count for 2010, it turns out that our guesses are close, but both of too high: the actual reported figure for 2010 is 2,099,451.

It would have been surprising to have guessed perfectly based on nothing other than a linear trend — and the fact that we came as close as we did speaks well of this sort of “back of the envelope” projection technique (at least for the case of steady-growth). But there was a lot of information contained in those data points that we essentially ignored: our two trendlines were really based on nothing more than a start and an end point.

A more sophisticated set of tools for making projections — which may be able to extract some extra meaning from the variation contained in the data — is provided in R by the excellent forecast package, developed by Rob Hyndman of the Monash University in Australia. To access these added functions, you’ll need to install it:

> install.packages(forecast)
> library(forecast)

Time-series in R: an object with class

Although R is perfectly happy to help you analyze and plot time series data organized in vectors and dataframes, it actually has a specialized object class for this sort of thing, created with the ts() function. Remember: R is an “object-oriented” language. Every object (a variable, a dataframe, a function, a time series) is associated with a certain class, which helps the language figure out how to manage and interact with them. To find the class of an object, use the class() functions:

> a=c(1,2)
> class(a)
[1] "numeric"
> a=TRUE
> class(a)
[1] "logical"
> class(plot)
[1] "function"
> a=ts(1)
> class(a)
[1] "ts"
> 

Continue reading…

Tags: , , , , ,

Examining Historical Growth II: Using the past to predict the future

Posted by Ezra Glenn on April 12, 2012
Data, Missions, Shape Your Neighborhood, Simulation / No Comments

In our previous mission we plotted population numbers in Houston for 1900–2000, to start to understand the growth trend for that city. Now, what if we didn’t have access to the latest Census figures, and we wanted to try to guess Houston’s population for 2010, using nothing but the data from 1900–2000?

One place to start would be with the 2000 population (1,953,631) and adjust it a bit based on historical trends. With 100 year’s worth of data, we can do this in R with a simple call to some vector math.1

> attach(houston.pop) # optional, see footnote
> population[11]      # don't forget: 11, not 10, data points
[1] 1953631
> annual.increase=(population[11]-population[1])/100   # watch the parentheses!
> population[11]+10*annual.increase
[1] 2144531
> 

Remember that we actually have eleven data points, since we have both 1900 and 2000, so we need to specify population[11] as our endpoint. But since there are only ten decade intervals, we divide by 100 to get the annual increase. Adding ten times this increase to the 2000 population, we get an estimate for 2010 of 2,144,531. (Bonus question: based on this estimated annual increase, in what year would Houston have passed the two-million mark?2)

Continue reading…

Tags: , , , , ,