simulation

Pitfalls of Working with Time-Series Data

Posted by Ezra Glenn on April 24, 2012
Data, Missions, Shape Your Neighborhood, Simulation / No Comments

In addition to the general caution against using past data for projecting future conditions (and the need for equally spaced time intervals mentioned above), the particulars of time series data require additional attention to some special issues.

Inflation and Constant Dollars

Any time series that deals with dollars (or yen, pounds sterling, wampum, or other forms of currency) must confront the fact that the value of money changes over time. If you are simply making a time series showing the shrinking value of the dollar, that’s fine — it’s what you want to show — but if you want to show something else (say, changes in wages or home prices), then you will need to correct your data to some common base. Usually this is done by starting with a base year (often the start or end of the series, or the “current” year) and adjusting values based on changes to some official inflation statistic (e.g., the consumer price index).1

Growth and Change to the Underlying Population

Over time — especially over long periods — the population of a place can change quite a lot, both in terms of overall numbers and the demographic components. As with inflation, this may be precisely the change that you are interested in observing and predicting (as in the first examples in this chapter), but at times it can introduce a spurious or intervening variable into your analysis.

Continue reading…

Tags: , , , , ,

Examining Historical Growth III: The forecast() package

Posted by Ezra Glenn on April 21, 2012
Data, Missions, Shape Your Neighborhood, Simulation / No Comments

In our last mission we used R to plot a trend-line for population growth in Houston, based on historical data from the past century. Depending on which of two different methods we used, we arrived at an estimate for the city’s 2010 population of 2,144,531 (based on the 100-year growth trend for the city) or 2,225,125 (based on the steeper growth trend of the past fifty years). Looking now at the official Census count for 2010, it turns out that our guesses are close, but both of too high: the actual reported figure for 2010 is 2,099,451.

It would have been surprising to have guessed perfectly based on nothing other than a linear trend — and the fact that we came as close as we did speaks well of this sort of “back of the envelope” projection technique (at least for the case of steady-growth). But there was a lot of information contained in those data points that we essentially ignored: our two trendlines were really based on nothing more than a start and an end point.

A more sophisticated set of tools for making projections — which may be able to extract some extra meaning from the variation contained in the data — is provided in R by the excellent forecast package, developed by Rob Hyndman of the Monash University in Australia. To access these added functions, you’ll need to install it:

> install.packages(forecast)
> library(forecast)

Time-series in R: an object with class

Although R is perfectly happy to help you analyze and plot time series data organized in vectors and dataframes, it actually has a specialized object class for this sort of thing, created with the ts() function. Remember: R is an “object-oriented” language. Every object (a variable, a dataframe, a function, a time series) is associated with a certain class, which helps the language figure out how to manage and interact with them. To find the class of an object, use the class() functions:

> a=c(1,2)
> class(a)
[1] "numeric"
> a=TRUE
> class(a)
[1] "logical"
> class(plot)
[1] "function"
> a=ts(1)
> class(a)
[1] "ts"
> 

Continue reading…

Tags: , , , , ,

Examining Historical Growth II: Using the past to predict the future

Posted by Ezra Glenn on April 12, 2012
Data, Missions, Shape Your Neighborhood, Simulation / No Comments

In our previous mission we plotted population numbers in Houston for 1900–2000, to start to understand the growth trend for that city. Now, what if we didn’t have access to the latest Census figures, and we wanted to try to guess Houston’s population for 2010, using nothing but the data from 1900–2000?

One place to start would be with the 2000 population (1,953,631) and adjust it a bit based on historical trends. With 100 year’s worth of data, we can do this in R with a simple call to some vector math.1

> attach(houston.pop) # optional, see footnote
> population[11]      # don't forget: 11, not 10, data points
[1] 1953631
> annual.increase=(population[11]-population[1])/100   # watch the parentheses!
> population[11]+10*annual.increase
[1] 2144531
> 

Remember that we actually have eleven data points, since we have both 1900 and 2000, so we need to specify population[11] as our endpoint. But since there are only ten decade intervals, we divide by 100 to get the annual increase. Adding ten times this increase to the 2000 population, we get an estimate for 2010 of 2,144,531. (Bonus question: based on this estimated annual increase, in what year would Houston have passed the two-million mark?2)

Continue reading…

Tags: , , , , ,

Examining Historical Growth I: Basic trends

Posted by Ezra Glenn on April 11, 2012
Data, Missions, Shape Your Neighborhood, Simulation / No Comments

The nature of predictions

To paraphrase John Allen Paulos, author of A Mathematician Reads the Newspaper, all expert predictions can be essentially restated in one of two ways: “Things will continue roughly as they have been until something changes”; and its corollary, “Things will change after an indeterminate period of stability.” Although these statements are both true and absurd, they contain a kernel of wisdom: simply assuming a relative degree of stability and painting a picture of the future based on current trends is the first step of scenario planning. The trick, of course, is to never completely forget the “other shoe” of Paulos’s statement: as the disclaimer states on all investment offerings, “Past performance is not a guarantee of future results”; at some point in the future our present trends will no longer accurately describe where we are headed. (We will deal with this as well, with a few “safety valves.”)

From the second stage of the Rational Planning Paradigm (covered in the background sections of the book) we should have gathered information on both past and present circumstances related to our planning effort. If we are looking at housing production, we might have data on annual numbers of building permits and new subdivision approvals, mortgage rates, and housing prices; if we are looking at public transportation we might need monthly ridership numbers, information of fare changes, population and employment figures, and even data on past weather patterns or changes in vehicle ownership and gas prices. The first step of projection, therefore, is to gather relevant information and get it into a form that you can use.

Since we will be thinking about changes over time in order to project a trend into the future, we’ll need to make sure that our data has time as an element: a series of data points with one observation for each point or period of time is known as a time series. The exact units of time are not important—they could be days, months, years, decades, or something different—but it is customary (and important) to obtain data where points are regularly spaced at even intervals.1 Essentially, time series data is a special case of multivariate data in which we treat time itself as an additional variable and look for relationships as it changes. Luckily, R has some excellent functions and packages for dealing with time-series data, which we will cover below in passing. For starters, however, let’s consider a simple example, to start to think about what goes into projections. Continue reading…

Tags: , , , , ,

World on a Wire (Fassbinder, 1973)

Posted by Ezra Glenn on February 13, 2012
Film / No Comments

Janus/Criterion has just re-released a beautiful print of Rainer Werner Fassbinder’s 1973 two-part film, World on a Wire, and I was fortunate enough to have 210 minutes free on a Saturday afternoon to go to the Brattle to watch it. It’s great.

Plot-wise, the film covers much of the same ground as The Matrix and Inception – although it was made 30 years earlier – but this aspect is covered pretty well by other reviews. That said, the themes of living in the dream-like reality of a world of simulacra – and the ultimate dream of escape to a higher reality – take on a special richness in Fassbinder’s work, infused with the pathos of counter-cultural 1970s Germans.1

Visually, the entire film (originally shot in square 16mm for television, like an instamatic photograph) is beautifully fake, presenting the veneer of the world that was the 1970s: plastic molded offices full of plastic molded furniture and plastic molded people with plastic, blank faces – with the exception of our hero, Fred Stiller, the new Director of the Simulacron Project at the Institute for Cybernetics and Futurology. Stiller’s work, known as Simulacron 1, is the most sophisticated computer simulation ever made, a massive program modeling a world of 10,000 “identity units” for the purpose of making accurate scientific and government projections. It’s a planner’s dream: a simulated world where real life plays out for the purposes of forecasting future conditions and testing varios alternatives (“How much steel production will the economy require in 30 years?”; “Should we build more housing units in Baden-W├╝rttemberg or Schleswig-Holstein?”; and so on).

Continue reading…

Tags: , ,