JRN-418

Data Journalism at CCSU, Fall 2015

Joining with Google Fusion Tables

This walkthrough accompanies a lecture about the Bristol Herald Courier's story about missing royalties.

Start by downloading the data sets

Click on the January Escrow Agent Summary link under 2009 Escrow Agent Summaries


Select January 2009. Let the page reload.

Click Export to Excel

 


It will download a file called 'frmPrint.aspx' -- find it on your computer and rename it to 'well_production.xls'

 


Bring both Excel spreadsheets into Google Spreadsheets to clean up

 


Select 'Insert new sheet(s)'

Rename the sheets appropriately: one for production, the other for payments

 


Go to the production sheet and delete the buffer row and column.

 


OK, we need to clean it up even more. There are almost 7,000 rows in the production versus more than 800 in the escrow accounts sheet.

Go to the payments sheet. Scroll down and notice two different types of data. The one on top are the accounts that got money. The data below are the wells that made no money. That's what we want to focus on. Delete all the extra rows above it except for the header titles. (Everything above row 744).

Delete zero balance at the bottom, too.

 


Go to the production sheet, and notice that there are a few blank cells and rows.

First, freeze the top row.

View > Freeze > 1 row

 


Select the little gray arrow in column C and click Sort sheet A - Z.

 


All the extra rows are at the bottom now. Select the rows (6656 and below) and delete them.

 


TIME TO CLEAN UP THE COLUMNS

To match up the spreadsheets successfully, the names need to match exactly.

That means we have to get rid of anything that might throw it off like spaces, dashes, or parenthesis.

Select the 'Unit ID' column in the Payments sheet and press control+f

Click the '...' to expand the options.

 


In the 'Find' field, put in '-' and in the 'Replace with' field, leave it blank.

This will essentially delete all dashes.

BUT, we have to limit the range, we don't want to mess with any other data in the sheet.

So next to search, select 'specific range' from the pull down menu and click the little grid.

 


Select the B column and it will limit the search and replace to that column.

 


It should go back to the previous pop up. Select 'Replace all'.

You should get a 'Replaced all instances of '-' with ''

 


Repeat the step with a space to delete spaces.

 


Now, go to the Production sheet.

Get rid of dashes, spaces, #, and "W/PL" (Don't forget the space in front of W/PL)

 


Ah, one more thing, there are entries with parentheses like 9550(CCC165).

We don't want anything in the parentheses and we don't want the parentheses either. We need to use a formula to get rid of them.

In G2 of Production, type in the formula: =split(C2, "(")

What this does is it looks at cell C2, and split it when it notice an open parentheses sign.

Drag this formula down so that it applies to all rows in the spreadsheet. You'll see the leftovers in the column to the right. You can delete that one.

 


OK, go back to Payments and delete the empty columns.

 


Now, let's download each of these sheets as a separate CSV file.

 


You should have two CSV files.

One called Production, the other called Payments.

 


Go to your Google Drive.

Click on New > More > Connect more apps

 


Search for Fusion Tables and click + Connect

 


Go back to your Google Drive. Click on New > More > Fusion Tables

 


Click From This Computer > Choose File.

Select the Production csv. Click Next. Click it agian. Then click Finish.

 


Click File > New table and bring in the Payments csv like before.

 


Now in the Payments fusion table, click File > Merge

 


Select Production fusion table and click next.

 


Now, select the columns that you want joined.

It should be UnitID and Copy. Click next.

 


You can select the columns you want brought over from Production. Leave everything checked. Click Merge.

 


A new table has been created.

Go ahead and click over to it and check it out.

 


Click file and download

 


Pop it open. Go ahead and sort it so the blank rows float to the bottom.

 


It looks like 56 wells that got no money actually produced something.

That's 56 out of 154. Almost a third.

 


It's by no means perfect yet. The wells need to be cleaned up further and rejoined.

But it's a start.