Structure of data

Andrew Tran
September 23, 2015

Create a data memo for your beat

Due by next class

Write me draft paper that describes the kinds of data that exists on the beat you want to focus on:

  • That you know exist
  • That you hope to find
  • Or that you will collect yourself

Create a data memo for your beat

Regarding the data you're envisioning, answer

  • Why this data source is interesting and what you’re curious about.
  • Where this data exists, or where you expect to find it
  • Anticipated problems in collecting or analyzing it

This is just brainstorming and you don't have to commit to what you come up with for the homework for your midterm or final project.

This will hopefully set you on the path toward thinking of data as soon as possible. I will give feedback and help you refine researching possibilities.

Types of data

Excel

Types of data

CSV (Comma Separated Values)

Types of data

TSV (Tab Separated Values)

Types of data

PDF (Portable Document Format) Original

Types of data

PDF Scanned

Types of data

XML (Extensible Markup Language)

Types of data

JSON (JavaScript Object Notation)

The thing in common with all these data formats?

Though modern spreadsheet software usually has complicated file formats, data, at its core, is just text.

Quick exercise

Spreadsheet that I can match with… Fuck…

Ok

Why am I obsessed with data structure?

After World War 2, manufacturing was at an all time high.

But the bottleneck was in the shipping of products. Up to 60 to 75 percent of the cost of shipping cargo was spent on what the ship did at dock– waiting for longshoremen to hand-sort and carry items on and off teh boats.

A ship could spend as many days loading its cargo and unloading it as it would to cross the ocean.

Metaphor

Metaphor

The container is boring, yes. It's much less interesting than how things are built and things are consumed.

However, the shipping container is a vital part of the global economy today.

So yes, spreadsheets are boring compared to the reporting and the fancy graphic that comes from it. But better organized your data is, the easier it will be to do cool stuff with it.

Quick Exercise

Two sentences:

  • The Apple Company, based in Manhattan, hired Lisa Apple for a position at its Queens store at 40 hrs/week.

  • Nick Apples commutes to Brooklyn and spends three hours a day there selling apples from a fruit stand.

Humans can figure out the data in these sentences pretty easily. It would take some advanced programming for a computer to parse it.

How could you structure these sentences into a spreadsheet?

Quick Exercise

First name,Last name,Employer,Work location,Hours/wk

Lisa,Apple,Apple Company,Queens,40

Nick,Apples,A fruit stand,Brooklyn,21

Enter that text into a plain-text editor and save it.

Then open it in Excel or Google Spreadsheets. You should see it rendered as if you had entered the data in a spreadsheet to begin with.

Pipes

You can use anything you want as a delimiter. Here’s the same data, delimited by “pipes”:

First name|Last name|Employer|Work location|Hours/wk Lisa|Apple|Apple Company|Queens|40 Nick|Apples|A fruit stand|Brooklyn|21

Faces of Deathrow

Where'd the data come from?

Faces of Deathrow

Where'd the data come from?

  • Texas Department of Criminal Justice
  • Cost of death row inmates did not exist
  • So basic information and mug shot then
  • Mug shots from commissary department
  • Ultimately, it cost about $250, down from $1,500.
  • Crime summaries by looking at TDCJ crime records, court documents and news articles

Faces of Deathrow

What it runs on:

  • Tribune’s App Kit, which is built on Gulp, a task runner written in Node.js
  • The data is pulled into the app from a Google Sheets spreadsheet
  • JavaScript/jQuery and SCSS were used to build the front end

Exercise

Look at the first few entries in the Death Row interactive.

How would you structure a spreadsheet based on the information listed?

Create a CSV file from scratch based on the first two entries.

Upload the file to your Github private repo

I will walk you through it if you have trouble.

Here's the tutorial if you need a refresher.

Joining

High school

  • Lisa
  • Gary
  • Chelsea

College

  • Monica
  • Gary
  • Chelsea
  • Lisa
  • Sam

Joining

High school -> College

  • Lisa -> Lisa
  • Gary -> Gary
  • Chelsea -> Chelsea

Lisa, Gary, and Chelsea are high schoolers in the college crew

Joining

College -> High school

  • Monica -> X
  • Gary -> Gary
  • Chelsea -> Chelsea
  • Lisa -> Lisa
  • Sam -> X

Lisa, Gary, and Chelsea are college kids from the high school crew

AND

Monica and Sam are college kids NOT from the high school crew