Friday, September 12, 2014

Embedding RData files in Rmarkdown files for more reproducible analyses


For those of us interested in reproducible analysis, Rmarkdown is a great way of communicating our code to other researchers. Rstudio, in particular, makes it very easy to create attractive HTML document containing text, code, and figures, which can then be sent to colleagues or put on the internet for anyone to see. If you aren't using Rmarkdown for your statistical analyses, I recommend you start; you'll never go back to simple script files again (and your colleagues won't want you to).

In this post, I describe how to improve your Rmarkdown by embedding data that can be downloaded by anyone viewing the document in a modern browser with javascript enabled. For a quick look, see the example Rmd file and resulting HTML file.

One of the drawbacks of Rmakdown, from a reproducible analysis perspective, is that the data is not a part of the document itself. Typically, an Rmarkdown file will use R code to load a file from your disk, and when you send the resulting HTML file to a colleague, or put it on the internet, that file is separate. It must be sent in an email or placed on a server to be downloaded.

This raises the possibility that the data could get separated from the code, and I think this is a terrible thing for reproducible analysis. In my mind, the data and the document and data should travel together as a single document. What we would like is a method of encoding R data into the HTML file such that any user who has access to the HTML file can download it, without even having access to the internet.

As it turns out, files can be encoded in an HTML document via the URI data scheme. All we need is an R function that encodes the data, and produces a link to enable downloading the data.
setDownloadURI = function(list, filename = stop("'filename' must be specified"), textHTML = "Click here to download the data.", fileext = "RData", envir = parent.frame()){
  require(base64enc,quietly = TRUE)
  divname = paste(sample(LETTERS),collapse="")
  tf = tempfile(pattern=filename, fileext = fileext)
  save(list = list, file = tf, envir = envir)
  filenameWithExt = paste(filename,fileext,sep=".")

  uri = dataURI(file = tf, mime = "application/octet-stream", encoding = "base64")
  cat("<a style='text-decoration: none' id='",divname,"'></a>
    <script>
    var a = document.createElement('a');
    var div = document.getElementById('",divname,"');
    div.appendChild(a);
    a.setAttribute('href', '",uri,"');
    a.innerHTML = '",textHTML,"' + ' (",filenameWithExt,")';
    if (typeof a.download != 'undefined') {
      a.setAttribute('download', '",filenameWithExt,"');
    }else{
      a.setAttribute('onclick', 'confirm(\"Your browser does not support the download HTML5 attribute. You must rename the file to ",filenameWithExt," after downloading it (or use Chrome/Firefox/Opera). \")');
    }
    </script>",
    sep="")
}
The first argument of the function, list, is a character vector containing names of variables to save in the RData file.

Once this function is declared, all we need to do is call it in our Rmd file. If we use the argument results = 'asis' in our R code block, it will inject the appropriate HTML code into our compiled HTML document to allow a download of the embedded data as an RData file, and anyone with the HTML file can download it.

Unfortunately, blogger will not allow me to embed the data into a post; therefore, a complete, self-contained example Rmd file can be found here, and the resulting HTML file can be found here.

Keep in mind, however, that the data file is actually embedded in the HTML file. This means that the resulting HTML file can be very large, if your data file is large. Also consider that data are encoded in base64, which increases the size of the file by about a third over the equivalent RData binary file. For very large data sets, one might consider hosting them outside of the HTML file; but for many purposes, the technique I describe will improve the ease with which you can share reproducible analyses.

13 comments:

  1. This is a fantastic idea! Could you also store the Rmd file as an R object in the list of objects to save? That would avoid pasting the Rmd code in the HTML file or keeping it as a separate file.

    ReplyDelete
    Replies
    1. In my haste (and impatience I guess) I came up with this little snippet:

      1. Import your Rmd script as an R object using
      script1 <- readChar("script.Rmd", file.info("script.Rmd")$size)
      2. Add instructions to save this object to the computer using
      cat(script1, file = "script1.Rmd")
      3. And then instruct to open "script1.Rmd" and run it.

      Delete
  2. Hi Tom, that's an interesting idea. That way, the Rmd can always be had by simply downloading it. But why not compress the Rmd file then insert the compressed file into the HTML document in the same manner that the Rdata file is inserted above?

    ReplyDelete
    Replies
    1. A bit of searching revealed that on Windows this might not work, since zip might not be in the path. I guess saving it in the Rdata object is the next best thing.

      Delete
  3. Good point, Richard. That might make it more straight forward.

    ReplyDelete
  4. What’s up, every time i used to check blog posts here in the early hours in the break of day, for the reason that i enjoy to gain knowledge of more and more.
    ---------------------------------------------
    Lol Elos

    ReplyDelete
  5. I am in love with the added security that comes with a static site. Also adore the INK FOR ALL editor's ability to export in a Markdown format

    ReplyDelete
  6. Nice & Informative Blog !
    QuickBooks is an easy-to-use accounting software that helps you manage all the operations of businesses. In case you want immediate help for QuickBooks issues, call us on QuickBooks Customer Service Number 1-(855)-729-7482.

    ReplyDelete

  7. I was diagnosed as HEPATITIS B carrier in 2013 with fibrosis of the
    liver already present. I started on antiviral medications which
    reduced the viral load initially. After a couple of years the virus
    became resistant. I started on HEPATITIS B Herbal treatment from
    ULTIMATE LIFE CLINIC (www.ultimatelifeclinic.com) in March, 2020. Their
    treatment totally reversed the virus. I did another blood test after
    the 6 months long treatment and tested negative to the virus. Amazing
    treatment! This treatment is a breakthrough for all HBV carriers.

    ReplyDelete
  8. I like the content and informative things given in the post RTO Hypothecation removal in Ghaziabad

    ReplyDelete
  9. This is very informative and interesting for those who are interested in blogging field. Really very happy to say, your post is very interesting to read. Also visit this profile on click counter online. Click counter tool allows you calculate your clicking speed.

    ReplyDelete