Andrew Ba Tran
November 4, 2015
An enumeration must be made every 10 years, which determines
In the past, surveys filled out as part of the Census have helped policymakers decide what to do in times of crisis, like during the Great Depression when they needed an idea of how grim the situation actually was.
But another effect of the Census is that the public can use the data to answer their own questions.
For some social and data scientists, working with Census data is a breeze. But for everyone else, it can be overwhelming. So today we’ll talk about understanding the Census — and how total beginners can retrieve datasets.
To best understand how the Census works, it’s important to understand it’s history.
The first Census in 1790 mainly wanted to track males older than 16 — to gauge military potential — as well as the number of free people and slaves. It surveyed the head of each household, which looked like this:
As the population grew, among the biggest challenges was figuring out how to gather large amounts of data — and to count it without computers.
In 1870, the first tallying machine was introduced, but it was still a laborious process — and it took nearly a decade to count all the surveys, at which point it was time to begin the next Census.
It wasn’t until 1890 that an electronic tabulating system helped speed up counting.
Another large problem was — and still is — the undercounting of some minority populations.
One solution was to use statistical sampling to adjust the data for these populations, but the Supreme Court ruled in 1999 that using these methods to allocate seats in the U.S. House of Representatives violated the Census Act of 1976.
It did not rule out the use of sampling for redistricting or the allocation of federal resources.
In 2010, the census bureau estimated that it missed 1.5 million minorities.
All this is to say: Counting large numbers of people is hard, and it’s important to keep that in mind when looking at Census data.
Before 2010, there used to be two surveys: a “short” survey, which all households received, and a “long-form” survey, which one-in-six households received in 2000.
But in 2010, that changed.
Every household received a 10-question survey for the biennial Census.
But the in-depth questions went to the American Community Survey, which surveys about 295,000 people a month.
It is a running survey, which gives more up-to-date data compared to the previous long-form Census survey taken once every 10 years.
Not everyone supports this mandatory survey, but it helps us answer some very important questions.
Because it’s not a survey of the entire population, you can’t answer questions about small groups of people using just one year's worth of data — otherwise known as “one-year estimates.”
So the Census combines data from multiple years to provide a better estimate for smaller locales or demographic groups.
That’s what it means to use “three-year estimates” or “five-year estimates.”
There are multiple ways to get the data — all with the upsides and downsides.
In short, as the interface becomes easier, the flexibility to find custom datasets becomes harder.
I’ll run through three options here.
The easiest site is CensusReporter.org.
You input a location you’re looking for, then it walks you through the available datasets.
For example, we can find the median age by sex for Connecticut.
From there, you can create various geographic breakdowns.
In Connecticut, if you want to divide it by town, use “county subdivisions.”
If you want to divide it by Census tract — which are plots of land with about 4,000 people each — then you can select that.
The magic, but also the downside, is that Census Reporter tries to guess which dataset you want.
It’ll switch between ACS five-year estimates and one-year estimates, and you don’t have to make that choice.
Once you have the dataset you want, you download the data on the top right.
American FactFinder gives you a little more flexibility.
You can select your topic and your geographic constraints, and then it will push out datasets that match your query.
It’s a lot easier to explore what’s available with FactFinder, because it categorizes the different datasets available.
Here's a walkthrough.
The Integrated Public Use Microdata Series, or IPUMS, is an incredibly powerful tool that lets you extract data from 1850 to the present.
The other tools don’t allow for this.
For example:
Do you want to find the average household size from 1900 to the present?
This is your tool.
There are a lot of nuances to historic Census data, with various things affecting it, such as when the Census Bureau began primarily mailing the surveys versus conducting them in person.
But if you’re using IPUMS, you might already know this.
In other words: Since this tutorial is for beginners, I just mentioned IPUMS so you know that it exists.
Have a long weekend to learn?
Here are some good YouTube videos to walk you through using IPUMS.