My Two Census

Formerly the non-partisan watchdog of the 2010 US Census, and currently an opinion blog that covers all things political, media, foreign policy, globalization, and culture…but sometimes returning to its census/demographics roots.

Freakonomics: Justin Wolfers says you can’t trust the 2010 Census

The following comes from Justin Wolfers, a professor at my alma mater, the University of Pennsylvania, who writes for The New York Times’ Freakonomics blog and answers the age old question… “Can you trust the census?”:

No. At least that’s the conclusion of an important new paper (ungated versionhere) by Trent AlexanderMichael Davern and Betsey Stevenson, who find enormous errors in some critically important economic datasets.

Let’s start with the 2000 Decennial Census. Your responses to the Census were used for two purposes. First, the Census Bureau tallied up every response to produce its official population counts. And second, it produced a 1-in-20 sub-sample of these responses, which it made available for analysis by researchers. Just about every economist I know has used this Census sub-sample, as do a fair number of demographers, sociologists, political scientists, and private-sector market researchers.

The errors are documented in a stunningly straightforward manner. The authors compare the official census count (based on the tallying up of all Census forms) with their own calculations, based on the sub-sample released for researchers (the “public use micro sample,” available through IPUMS). If all is well, then the authors’ estimates should be very close to 100% of the official population count. But they aren’t:

Census ChartSource: Inaccurate Age and Sex Data in the Census PUMS Files: Evidence and Implications
Trent Alexander, Michael Davern and Betsey Stevenson

The two estimates are pretty similar for those younger than 65. But then things go haywire, with the alternative estimates disagreeing by as much as 15%. In fact, the microdata suggest that there are more very old men than very old women — I know some senior women who wish this were true! The Census Bureau has confirmed that the problem isn’t with the authors’ calculations. Rather, the problem is in the public-use microdata sample.

What’s the source of the problem? The Census Bureau purposely messes with the microdata a little, to protect the identity of each individual. For instance, if they recode a 37-year-old expat Aussie living in Philadelphia as a 36-year-old, then it’s harder for you to look me up in the microdata, which protects my privacy. In order to make sure the data still give accurate estimates, it is important that they also recode a 36-year-old with similar characteristics as being 37. This gives you the gist of some of their “disclosure avoidance procedures.” While it may all sound a bit odd, if these procedures are done properly, the data will yield accurate estimates, while also protecting my identity. So far, so good.

But the problem arose because of a programming error in how the Census Bureau ran these procedures. The right response is obvious: fix the programs, and publish corrected data. Unfortunately, the Census Bureau has refused to correct the data.

The problem also runs a bit deeper. If the mistake were just the one shown in the above graph, it would be easy to simply re-scale the estimates so that there are no longer too many, say, 85-year-old men — just weight them down a bit. But it turns out that the same coding error also messes up the correlation between age and employment, or age and marital status (and, the authors suspect, possibly other correlations as well). When you break several correlations like this, there’s no easy statistical fix.

Tags: , , , , , ,

3 Responses to “Freakonomics: Justin Wolfers says you can’t trust the 2010 Census”

  1. DVR Says:

    The headline for this post is misleading. Mr. Wolfers says nothing about the 2010 Census in his post. The paper he references discusses the 2000 census, the ACS, and the CPS. The Census Bureau has refused to correct that data. Hopefully, it will correct the programming issue for the 2010 census.

  2. Suitlandman Says:

    These errors are the tip of the iceberg. Data processing errors and data quality deficiencies abound in data collected and processed by the Census Bureau.

    Few data analysts work at the Census Bureau. Most staff are managers, clerks, geographers, administrative assistants, marketing and public relations specialists, computer technicians, project managers, computer programmers, sampling statisticians, contract managers, contractors, friends and relatives of elected officials and political patronage appointees. Only a few employees are allowed to produce reports of data analyses. Some of them are incompetent but all of them can be trusted to or pressured to produce results that management will be comfortable with.

    It has been easier for the Census Bureau to conceal data quality problems than to work on them.
    Those responsible number in the hundreds.

  3. My Two Census » Blog Archive » Solutions to the Census Bureau’s Statistic Failures… Says:

    [...] week we wrote about the Freakonomics article that questioned the Census Bureau’s methodologies for reporting statistics. Well, here are a [...]