Setting up a reproducible data analysis workflow in R

Zip of files referred to in this walkthrough

Why R + Markdown?

Markdown

Super simple way to add formating to plain text

  • headers
  • bold
  • bullet lists
  • links

Created by John Gruber (of Daring Fireball) as a simple way for non-programming types to write in an easy-to-read format that could be converted directly into HTML.

R code

boston_payroll %>%
  group_by(TITLE) %>%
  summarise_each(funs(mean), REGULAR, OVERTIME)

.Rmd files

An R Markdown (.Rmd) file is a record of your analysis process.

It contains the code that a scientist needs to reproduce your work along with the narration that a reader needs to understand your work.

Literate programming

Show your work

Everyone’s doing it!

Reproducible research

The idea that data analyses, and more generally, stories, are published with their data and software code so that others may verify the findings and build upon them.

  • Look for insight deeper than the summary report
  • Verify details yourself
  • Learn new techniques from looking at other processes
  • Figure out ways to apply the analysis to your own needs
  • Your future self will thank you for documenting your process now

At the click of a button, or the type of a command, you can rerun the code in an R Markdown file to reproduce your work and export the results as a finished report.

R Markdown supports dozens of static and dynamic output formats including

  • HTML
  • PDF
  • MS Word
  • Beamer
  • HTML5 slides
  • Tufte-style handouts
  • books
  • dashboards
  • scientific articles (white pages)
  • websites

Produce slick-looking PDFs

Be sure to get LaTex installed first.

IPython Notebooks

How .rmd files render in Github

Not well.

That’s fine, I’m not mad

library(DT)
library(dplyr)
library(readr)
payroll <- read_csv("../data/bostonpayroll2013.csv") %>% 
  select(NAME, TITLE, DEPARTMENT, REGULAR, OVERTIME) %>% 
  filter(row_number()<100)
datatable(payroll, extensions = 'Buttons', options = list(
    dom = 'Bfrtip',
    buttons = c('copy', 'csv', 'excel', 'pdf', 'print')
  )
)

Output .Rmd files to HTML

Host the files on an internal server

I have the code above aliased so I can type in a keyword and it will move all html files in a certain directory to an S3 server while preserving the structure of the subdirectories.

Host the HTML of your analysis

  • Internally or
  • On Github pages
  • Share with reporters and editors, let them explore your analysis further
  • Let them download customized spreadsheets with buttons in the DT (datatable) package
  • If your analysis gets updated, keep the file name then they only have to refresh their link
  • Then get into Shiny!

Coding in RMarkdown

1. Open a new .Rmd file

at File > New File > R Markdown.

Title the R Markdown file and select HTML as the output for now.

.Rmd structure

YAML HEADER

Optional section of render options written as key:value pairs.

  • At start of file
  • Between lines of (3 dashes)

CODE CHUNKS

Chunks of embedded R code. Each chunk:

  • begins with ```{r} (the key to the left of 1)
  • ends with ```

TEXT

Narration formatted with markdown mixed with code chunks.

2. Write document

Edit the default template by putting in your own code and text.

Intersperse the text with your code to tell a story.

2b. Label your chunks of code

You’ll see why in a moment.

2c. Notebooks style

You can run individual chunks of code before generating the full report to see how it looks.

Click the green arrow next to each chunk.

3. Knit document to create report

Use knit button or type render() to knit

3b. Check out the build log

Down in the console. Warnings and errors will appear.

Also measures progress by chunks, which is why it’s important to label them.

4. Preview output in IDE window

5. Output file

You have a .Rmd file and clicking knit HTML also generated a .html file

Datatables

You need to load libraries just like you would a normal script.

Let’s look at the data in R Markdown with a new package called DT that uses the Datatables jquery library.

Example from Part 1 in the chunks.MD file.

Open and knit chunks/01-chunk.Rmd

Specific features

Hide warnings, messages

Adding warning=F and message=F hid the little messages.

Example from Part 2 in the chunks.MD file.

Open and knit chunks/02-chunk.Rmd

Hide code

If the person you’re sharing this with has no interest in the code and only the quick results, use echo=F to hide the chunk of code and just display the output.

Example from Part 3 in the chunks.MD file.

Open and knit chunks/03-chunk.Rmd

Inline R code

Embed lines of R code within the narrative with

Example from Part 4 in the chunks.MD file.

Open and knit chunks/04-chunk.Rmd

Pretty tables

Make pretty tables with the knitr package and the kable function.

Example from Part 5 in the chunks.MD file.

Open and knit chunks/05-chunk.Rmd

Change theme and style

Change the appearance and style of the HTML document by changing the theme up top.

Options from the Bootswatch theme library includes:

  • default
  • cerulean
  • journal
  • cosmo

highlights

  • tango
  • pygments
  • kate

Example from Part 6 in the chunks.MD file.

Open and knit chunks/06-chunk.Rmd

Table of contents

Add a floating table of contents by changing html_document to toc: true and toc_float: true.

Example from Part 7 in the chunks.MD file.

Open and knit chunks/07-chunk.Rmd

Next steps?

Exporting as a PDF will require LaTex installed first * Get it from latex-project.org or MacTex

Check out all the features of R Markdown at RStudio

Publish your results to Github pages

Read more on how

Try to publish directly to Wordpress

I haven’t actually made this work but maybe you can try