Setting up a reproducible data analysis workflow in R

Zip of files referred to in this walkthrough

Version control system (vcs)

Git has been “repurposed” beyond software development

Journalists use it for methodology, but also to share raw and summarized data.

For teams to collaborate

It’s kinda complicated

So why?

Tough to justify for someone solo.

But it’s worth learning because of the capabilities for communicating your analysis and for future collaboration.

Setting up Github

A walkthrough explaining how to get connected later:

http://happygitwithr.com/

Options

Show off! Collaborate!

  • The R Community is active on Github
  • The more often you use it, the more often you can use others’ code and data
  • Easily import from Github repos into your workflow
  • Simple to run Shiny Apps locally with runGithub function

Markdown and Rmarkdown

Github loves Markdown. Even Rmarkdown.

Renders it as HTML.

Setting up GH Pages

After uploading your repo, click on Settings

Change the Source from None to master branch or master branch\docs (depending on where you want your root to be)

.gitignore large files

  • Don’t include files larger than 100 mb
  • Don’t include your keys or passwords
  • Try to exclude any extraneous files like r history

You can borrow this one

Include readmes and data dictionaries

  • Buzzfeed is a good model for how they index their story links and repos as a table

Please don’t create monster data repos