Reproducible research with R, Knitr, Pandoc and Word

Add references and a style sheet

Below I briefly outline why Pandoc is an essential part of my research workflow, and demonstrate how to seamlessly integrate it with a bibliographic system and code written in R to produce high quality word or pdf documents. I also include all the functions needed to get this working fast.

Knitr is great. I'm writing this in it right now. It 'knits' markdown together with R code and outputs some pretty excellent html pages. The difficulty is getting these into Word for final editing, emailing to colleagues, or similar. Try copy and paste, for instance, and you'll get the text and the formatting no problem, but any plots or tables will likely be replaced by Word's 'missing image' graphic. The solution: Pandoc.

There is an R library, Pander, which works well. But for full functionality you're best off downloading the Pandoc application from here.

Write a markup document in RStudio, set your working directory to the location of your file, then compile it as follows:

name = "demo"
library(knitr)
knit(paste0(name, ".Rmd"), encoding = "utf-8")
system(paste0("pandoc -o ", name, ".docx ", name, ".md"))

The code above works by running the command line from within R

And like magic the document is created.

Add references and a style sheet

Now let's make things a bit more interesting. What about adding references? Go to your reference manager of choice, export a BibTeX file with your library, save it in the same directory as above. For this step I have so far found Mendeley the best, because it will automatically synchronise with the BibTeX file - so there's no need to re-export the library every singly time you add a reference.

Now add references as follows:

O'Hara et. al. ran numerous tests to illustrate how a log transforming data consistently gave suboptimal results [@OHara2010].

Which results in:

O'Hara et. al. ran numerous tests to illustrate how a log transforming data consistently gave suboptimal results (O’Hara and Kotze 2010).

You can also add footnotes, which Word will read in the correct order. Just give them all a unique label, indicating where the note should go, and then subsequently specify the footnote text:

A linear model is inappropriate for count data, as it will predict values below 0 [^mynote1].

[^mynote1]: O'Hara et. al. ran numerous tests to illustrate how a log transforming data consistently gave suboptimal results [@OHara2010].

Output:

A linear model is inappropriate for count data, as it will predict values below 0.1

1 O'Hara et. al. ran numerous tests to illustrate how a log transforming data consistently gave suboptimal results (O’Hara and Kotze 2010).

And that's just about the basics covered. Compile this document with the function below

knitsDoc <- function(name) {
    library(knitr)
    knit(paste0(name, ".Rmd"), encoding = "utf-8")
    system(paste0("pandoc -o ", name, ".docx ", name, ".md --bibliography library.bib --csl taylor-and-francis-harvard-x.csl"))
}

As before the function will use the command line to execute, only now we've added in a few extra options: we' ve specified where our bibliography lies (remember, we we saves this as library.bib), and we've also specified the style format using the option 'csl'.

Any number of style formats can be downloaded from here, to match whatever journal or style you need to use. Or write your own. Just save it in your working directory, and call it by name, as above.

The end product:

The simplest way to really see how great Pandoc is is to try some of this code. Or compare my markup document with its output.

Other approaches:

A number of other good options exist. You could for instance use the package 'pander' together with a very clever set of utilities knitcitation to keep the operation within R. It's a bit more fiddly, and in my experience sllightly more buggy (Importing into R adds an extra step in which references can get corrupted), but it works remarkably well. For this, see the code below:

# For exporting to word from within R
library(pander)
name = "demo"
knit(paste0(name, ".Rmd"), encoding = "utf-8")
Pandoc.brew(file = paste0(name, ".md"), output = paste0(-name, "docx"), convert = "docx")

# Importing references to within R.
library(devtools)
install_github("knitcitations", "cboettig")
library(knitcitations)
bib <- read.bibtex("library.bib.part")

If you do go with the latter option, you might find this function useful - it will allow you to search within your reference library, and return the citation key for any matches. Very handy for that reference where you can't remember year of publication:

ref <- function(x) {
    bib[grep(x, bib, ignore.case = T)]$key
}

29 comments:

  1. Although the "pander" package can be useful for converting markdown text to other formats building on Pandoc, it's worth mentioning that the package was built to provide similar features like "knitr". beside "converting" almost any R object to markdown.
    Please see http://rapporter.github.com/pander/#brew-to-pandoc for details.

    ReplyDelete
    Replies
    1. Awesome, thanks. It looks a really handy package

      Delete
    2. KARNA RASA HATI YANG GEMBIRA BERKAT BANTUAN AKI SOLEH
      MAKANYA SENGAJA NAMA BELIAU SAYA CANTUNKAN DI INTERNET !!!

      assalamualaikum wr, wb, saya IBU PUSPITA WATI saya Mengucapkan banyak2
      Terima kasih kepada: AKI SOLEH
      atas nomor togelnya yang kemarin AKI berikan "4D"
      alhamdulillah ternyata itu benar2 tembus AKI
      dan berkat bantuan AKI SOLEH saya bisa melunasi semua hutan2…
      orang tua saya yang ada di BANK BRI dan bukan hanya itu AKI alhamdulillah,
      sekarang saya sudah bisa bermodal sedikit untuk mencukupi kebutuhan keluarga saya sehari2.
      Itu semua berkat bantuan AKI SOLEH sekali lagi makasih banyak ya, AKI
      yang ingin merubah nasib
      seperti saya...?
      SILAHKAN GABUNG SAMA AKI SOLEH No; { 082-313-336-747 }

      Sebelum Gabung Sama AKI Baca Duluh Kata2 Yang Dibawah Ini
      Apakah anda termasuk dalam kategori di bawah ini...!!
      1: Di kejar2 tagihan hutang..
      2: Selaluh kalah dalam bermain togel
      3: Barang berharga sudah
      terjual buat judi togel..
      4: Sudah kemana2 tapi tidak
      menghasilkan, solusi yang tepat..!
      5: Sudah banyak dukun ditempati minta angka ritual blom dapat juga,
      satu jalan menyelesaikan masalah anda..
      Dijamin anda akan berhasil
      silahkan buktikan sendiri
      Atau Chat/Tlpn di WhatsApp (WA)
      No WA Aki : 082313336747

      TERIMA KASIH YANG PUNYA
      ROOM ATAS TUMPANGANYA SALAM KOMPAK SELALU
      "KLIK DISINI BOCORAN TOGEL SGP HK SDY DAN DLL"

      Delete
    3. KARNA RASA HATI YANG GEMBIRA BERKAT BANTUAN AKI SOLEH
      MAKANYA SENGAJA NAMA BELIAU SAYA CANTUNKAN DI INTERNET !!!

      assalamualaikum wr, wb, saya IBU PUSPITA WATI saya Mengucapkan banyak2
      Terima kasih kepada: AKI SOLEH
      atas nomor togelnya yang kemarin AKI berikan "4D"
      alhamdulillah ternyata itu benar2 tembus AKI
      dan berkat bantuan AKI SOLEH saya bisa melunasi semua hutan2…
      orang tua saya yang ada di BANK BRI dan bukan hanya itu AKI alhamdulillah,
      sekarang saya sudah bisa bermodal sedikit untuk mencukupi kebutuhan keluarga saya sehari2.
      Itu semua berkat bantuan AKI SOLEH sekali lagi makasih banyak ya, AKI
      yang ingin merubah nasib
      seperti saya...?
      SILAHKAN GABUNG SAMA AKI SOLEH No; { 082-313-336-747 }

      Sebelum Gabung Sama AKI Baca Duluh Kata2 Yang Dibawah Ini
      Apakah anda termasuk dalam kategori di bawah ini...!!
      1: Di kejar2 tagihan hutang..
      2: Selaluh kalah dalam bermain togel
      3: Barang berharga sudah
      terjual buat judi togel..
      4: Sudah kemana2 tapi tidak
      menghasilkan, solusi yang tepat..!
      5: Sudah banyak dukun ditempati minta angka ritual blom dapat juga,
      satu jalan menyelesaikan masalah anda..
      Dijamin anda akan berhasil
      silahkan buktikan sendiri
      Atau Chat/Tlpn di WhatsApp (WA)
      No WA Aki : 082313336747

      TERIMA KASIH YANG PUNYA
      ROOM ATAS TUMPANGANYA SALAM KOMPAK SELALU
      "KLIK DISINI BOCORAN TOGEL SGP HK SDY DAN DLL"

      Delete
  2. I recently wrote a blog post about this where I wrap this up into a package called `reports`: http://trinkerrstuff.wordpress.com/2013/02/24/workflow-w-reports-package/ I believe we're working on very similar ideas at the same time. In fact I now look and see our blog posts went to Rbloggers the same day.

    ReplyDelete
    Replies
    1. Hi Tyler, I haven't had a play with 'reports' just yet, but I have found your qdap package extremely useful. Do you have any ambitions of scaling it up to be more friendly for languages other than English? I keep finding in my work that half the tests I might want to do are dependent on English language dictionaries/lists/syntax. Integration with a translation api would go a long way to solve this! Cheers, R

      Delete
    2. Hey Rolf I just saw you commented back (2 months later) sorry about that. Rolf right now qdap is definitely geared toward English. I'd love to see integration but I don't think that's possible. Right now I don't think it is because every language functions so differently from another. Many of the algorithms use an English specific algorithm to compute descriptives. Integration would require a team of people with heavy duty skills, language abilities, and logic beyond my own. This would essentially require a complete rewrite of 1/3 of qdap's functions. In my own work I have encountered a need to extend qdap to Korean but lack the knowledge of the language to even understand if my coding is correct. I am very open to a team of people making this a reality but do not think this is likely.

      Delete
  3. Im having some issues with getting images into the pdf, latex, docx from pandoc. I can knit up my rmd files nicely and they put the images into the html as raw data. I can even not do that and verify that the *.md file links to the cunk-*-.png files as necessary but none of the pandoc output have images embedded in them. Any suggestions?

    R

    ReplyDelete
    Replies
    1. Sorry to have ignored your comment for so long. I have found pandoc to do well with R generated graphics, but that for src links they have to be stored in the root folder and linked appropriately. This is what I did the other day when I encountered this problem: copied my images into the figure folder generated by R in the root, and used this notation:

      ![](figure/memoryEvent.png)

      Hope that works for you, best, R

      Delete
  4. It is easy to get started with Word, but it will be hard (or simply hell) to maintain, unless your colleagues promise not to touch the Word document when they read it. I had the pandoc integration in mind since the very beginning of knitr: https://github.com/yihui/knitr/issues/206 and I'm still thinking what would be the ideal approach. Hopefully there will be something interesting coming out in the near future. We are working on it.

    BTW, if the results are read only, I do not quite understand how Word could be an advantage for reading purposes. Isn't PDF way better? :)

    ReplyDelete
    Replies
    1. Thanks for the comment, Yihui! I must admit my need for Word integration is not really compatible with proper reproducible research - more that just so many people express a strong preference for seeing word files, all from article submissions, to colleagues making 'track changes' style alterations. I also keep finding that I need multi language spell checking options, but that's just my research. If there was any way of getting these changes back into the Rmd file that would be great, but at the moment it's manual labour for me on that count!

      Delete
  5. Thanks for this article! I invested a lot of time into learning LaTex as part of literate programming principles (in Emacs first, to boot). I have--reluctantly, at first--switched to RStudio and markdown (via knitr--Yay! knitr). I really appreciate the advice on how to integrate pandoc. I need to produce long reports (with internal links) with the option to publish as PDF or html AND Word. If I want my supervisor's blessing to invest time into the literate programming approach, it needs to be really smooth. In my field, clinical epidemiology, manuscripts are submitted to journals as Word files. Also, my colleagues do not work with Linux and are not comfortable with a wide variety of scripting languages. Installing new software on office PCs means going through IT... I want to create a folder that I can archive for others that contains finished documents in whatever format they want and that allows them to recreate documents from the source files as simply as possible. It's pretty important that they have to install as little new software as possible, e.g. RStudio plus the required packages, and do everything from within RStudio. (It's nice for me, too, because I'm not much more advanced in computing.) I feel a little uneasy that my supervisor can change the Word report and it won't be reflected in the source files, but that's how she works and I'm there to help. I think I will sell literate programming better by being accommodating than being purist.

    In any case, I am grateful to you R gurus!

    Tanya

    ReplyDelete
    Replies
    1. Hi Tanya, I have the same situation with my supervisor, which is why I looked into Pandoc in the first place. I just cant think of a good way to get changes back into the Rmd file. Possibly by saving as a txt file and finding some clever way to substitute edited text chunks. It's not high on my todo list right now, though! And Yihui is sure to come up with something much cleverer soon, anyway =) Best, R

      Delete
    2. Hi, Rolf, I've just discovered this blog, and it's kind of a godsend for my own dissertation research (which also involves topic modeling "medium-sized" data that R was choking on). Thank you for being so generous in sharing what you've discovered!

      I haven't used knitr (yet), so I'm not sure if this applies, but it sounds like you're looking for a way to have colleagues comment on and/or edit the markdown that R outputs, without having to lose the aspects of markdown that make you want to use it in the first place. Is that right? If so, I wonder whether Nate Kontny's *Draft* might be a useful bridge. You can read the feature set at http://docs.withdraft.com/. Hope that helps!

      -Ben

      Delete
    3. Thank you Ben, both for the kind words and the link to 'Draft'. I use Knitr less these days, and Draft looks like a very useful tool. Good luck with the dissertation!
      Rolf

      Delete
  6. Thanks for the nice post! I have been using knitr for my data analyses, but was hesitant for using it for writing anything more substantial because I did not know how to incorporate references before I found this.

    One piece of advice. The `knit` function has the optional argument `envir`. By default the RMarkdown file is knit in the parent environment. Thus variables defined in the RMarkdown file can clash with variables in the parent environment. For example, if `demo.Rmd` assigns a value to the variable `name`, your next line using pandoc would not work. The solution is to set `envir = new.env()` when calling `knit`. This way no matter what variables your blog readers use in their RMarkdown files, your code would still work.

    ReplyDelete
  7. Thanks for the great post, and an excellent blog in general.

    -Matthew A. Simonson PhD

    ReplyDelete
  8. Many online gambling games.
    Ruby888 Is a website that offers online casino games that provide one of the most realistic and popular gambling games of gamblers in Thailand. Free Online Games Betting Bets new games that update with us here. Online gambling is a popular choice among modern gamblers. Online risk exposure is more convenient by using online services. Unlike before, to travel to the country. Playing online gambling games can be easy to play, with no complicated procedure. Meet the needs of gamblers well with many online gambling sites to choose from. It also gives you the thrill of online gambling that challenges you all the way to a game against the game as well. Have fun with our 24/7 customer service. Join us today and win some great prizes. Every day, online games do not disappoint you. We are ready to serve you here. IBCbet

    ReplyDelete
  9. I really appreciate your professional approach.These are pieces of very useful information that will be of great use for me in future.

    ดูหนังออนไลน์

    ReplyDelete
  10. You should just take a look at bob casino review and get a 100% opportunity to make some cash.

    ReplyDelete
  11. Looking for the best sites to play online with real money withdrawal? Here is free offline casino games for you

    ReplyDelete
  12. 먹튀검증커뮤니티 중 먹튀검증 은 말그대로 먹튀를 검증해준다는 말이란걸 다들 알고계실겁니다. 커뮤니티 많이쓰는 단어지만 정말 다양하게 적용될수있는 표현할수 있는 해석은 두가지 정도로 나눠 집니다. 먹튀검증커뮤니티 를 해석해보면 먹튀검증 업체를 한 그룹으로 가르키는 말을 뜻합니다 먹튀

    ReplyDelete
  13. It is interesting to read your blog post and I am going to share it with my friends.
    먹튀검증업체

    ReplyDelete
  14. good idea,i like this image,i think i can do something by this
    먹튀사이트

    ReplyDelete
  15. My Friend Recommended This Blog And He Was Totally Right Keep Up The Good Work
    먹튀 검증 커뮤니티

    ReplyDelete