Setting up a reproducible data analysis workflow in R
Zip of files referred to in this walkthrough
Super simple way to add formating to plain text
Created by John Gruber (of Daring Fireball) as a simple way for non-programming types to write in an easy-to-read format that could be converted directly into HTML.
boston_payroll %>%
group_by(TITLE) %>%
summarise_each(funs(mean), REGULAR, OVERTIME)
An R Markdown (.Rmd) file is a record of your analysis process.
It contains the code that a scientist needs to reproduce your work along with the narration that a reader needs to understand your work.
Literate programming
Everyone’s doing it!
The idea that data analyses, and more generally, stories, are published with their data and software code so that others may verify the findings and build upon them.
At the click of a button, or the type of a command, you can rerun the code in an R Markdown file to reproduce your work and export the results as a finished report.
R Markdown supports dozens of static and dynamic output formats including
Be sure to get LaTex installed first.
Not well.
That’s fine, I’m not mad
library(DT)
library(dplyr)
library(readr)
payroll <- read_csv("../data/bostonpayroll2013.csv") %>%
select(NAME, TITLE, DEPARTMENT, REGULAR, OVERTIME) %>%
filter(row_number()<100)
datatable(payroll, extensions = 'Buttons', options = list(
dom = 'Bfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf', 'print')
)
)
Reporters sometimes aren’t very organized. Send them links instead!