Below I briefly outline why Pandoc is an essential part of my research workflow, and demonstrate how to seamlessly integrate it with a bibliographic system and code written in R to produce high quality word or pdf documents. I also include all the functions needed to get this working fast.
Knitr is great. I'm writing this in it right now. It 'knits' markdown together with R code and outputs some pretty excellent html pages. The difficulty is getting these into Word for final editing, emailing to colleagues, or similar. Try copy and paste, for instance, and you'll get the text and the formatting no problem, but any plots or tables will likely be replaced by Word's 'missing image' graphic. The solution: Pandoc.
There is an R library, Pander, which works well. But for full functionality you're best off downloading the Pandoc application from here.
Write a markup document in RStudio, set your working directory to the location of your file, then compile it as follows:
name = "demo"
library(knitr)
knit(paste0(name, ".Rmd"), encoding = "utf-8")
system(paste0("pandoc -o ", name, ".docx ", name, ".md"))
The code above works by running the command line from within R
And like magic the document is created.
Add references and a style sheet
Now let's make things a bit more interesting. What about adding references? Go to your reference manager of choice, export a BibTeX file with your library, save it in the same directory as above. For this step I have so far found Mendeley the best, because it will automatically synchronise with the BibTeX file - so there's no need to re-export the library every singly time you add a reference.
Now add references as follows:
O'Hara et. al. ran numerous tests to illustrate how a log transforming data consistently gave suboptimal results [@OHara2010].
Which results in:
O'Hara et. al. ran numerous tests to illustrate how a log transforming data consistently gave suboptimal results (OHara and Kotze 2010).
You can also add footnotes, which Word will read in the correct order. Just give them all a unique label, indicating where the note should go, and then subsequently specify the footnote text:
A linear model is inappropriate for count data, as it will predict values below 0 [^mynote1].
[^mynote1]: O'Hara et. al. ran numerous tests to illustrate how a log transforming data consistently gave suboptimal results [@OHara2010].
Output:
A linear model is inappropriate for count data, as it will predict values below 0.1
1 O'Hara et. al. ran numerous tests to illustrate how a log transforming data consistently gave suboptimal results (OHara and Kotze 2010).
And that's just about the basics covered. Compile this document with the function below
knitsDoc <- function(name) {
library(knitr)
knit(paste0(name, ".Rmd"), encoding = "utf-8")
system(paste0("pandoc -o ", name, ".docx ", name, ".md --bibliography library.bib --csl taylor-and-francis-harvard-x.csl"))
}
As before the function will use the command line to execute, only now we've added in a few extra options: we' ve specified where our bibliography lies (remember, we we saves this as library.bib), and we've also specified the style format using the option 'csl'.
Any number of style formats can be downloaded from here, to match whatever journal or style you need to use. Or write your own. Just save it in your working directory, and call it by name, as above.
The end product:
The simplest way to really see how great Pandoc is is to try some of this code. Or compare my markup document with its output.
Other approaches:
A number of other good options exist. You could for instance use the package 'pander' together with a very clever set of utilities knitcitation to keep the operation within R. It's a bit more fiddly, and in my experience sllightly more buggy (Importing into R adds an extra step in which references can get corrupted), but it works remarkably well. For this, see the code below:
# For exporting to word from within R
library(pander)
name = "demo"
knit(paste0(name, ".Rmd"), encoding = "utf-8")
Pandoc.brew(file = paste0(name, ".md"), output = paste0(-name, "docx"), convert = "docx")
# Importing references to within R.
library(devtools)
install_github("knitcitations", "cboettig")
library(knitcitations)
bib <- read.bibtex("library.bib.part")
If you do go with the latter option, you might find this function useful - it will allow you to search within your reference library, and return the citation key for any matches. Very handy for that reference where you can't remember year of publication:
ref <- function(x) {
bib[grep(x, bib, ignore.case = T)]$key
}
Although the "pander" package can be useful for converting markdown text to other formats building on Pandoc, it's worth mentioning that the package was built to provide similar features like "knitr". beside "converting" almost any R object to markdown.
ReplyDeletePlease see http://rapporter.github.com/pander/#brew-to-pandoc for details.
Awesome, thanks. It looks a really handy package
DeleteKARNA RASA HATI YANG GEMBIRA BERKAT BANTUAN AKI SOLEH
DeleteMAKANYA SENGAJA NAMA BELIAU SAYA CANTUNKAN DI INTERNET !!!
assalamualaikum wr, wb, saya IBU PUSPITA WATI saya Mengucapkan banyak2
Terima kasih kepada: AKI SOLEH
atas nomor togelnya yang kemarin AKI berikan "4D"
alhamdulillah ternyata itu benar2 tembus AKI
dan berkat bantuan AKI SOLEH saya bisa melunasi semua hutan2…
orang tua saya yang ada di BANK BRI dan bukan hanya itu AKI alhamdulillah,
sekarang saya sudah bisa bermodal sedikit untuk mencukupi kebutuhan keluarga saya sehari2.
Itu semua berkat bantuan AKI SOLEH sekali lagi makasih banyak ya, AKI
yang ingin merubah nasib
seperti saya...?
SILAHKAN GABUNG SAMA AKI SOLEH No; { 082-313-336-747 }
Sebelum Gabung Sama AKI Baca Duluh Kata2 Yang Dibawah Ini
Apakah anda termasuk dalam kategori di bawah ini...!!
1: Di kejar2 tagihan hutang..
2: Selaluh kalah dalam bermain togel
3: Barang berharga sudah
terjual buat judi togel..
4: Sudah kemana2 tapi tidak
menghasilkan, solusi yang tepat..!
5: Sudah banyak dukun ditempati minta angka ritual blom dapat juga,
satu jalan menyelesaikan masalah anda..
Dijamin anda akan berhasil
silahkan buktikan sendiri
Atau Chat/Tlpn di WhatsApp (WA)
No WA Aki : 082313336747
TERIMA KASIH YANG PUNYA
ROOM ATAS TUMPANGANYA SALAM KOMPAK SELALU
"KLIK DISINI BOCORAN TOGEL SGP HK SDY DAN DLL"
KARNA RASA HATI YANG GEMBIRA BERKAT BANTUAN AKI SOLEH
DeleteMAKANYA SENGAJA NAMA BELIAU SAYA CANTUNKAN DI INTERNET !!!
assalamualaikum wr, wb, saya IBU PUSPITA WATI saya Mengucapkan banyak2
Terima kasih kepada: AKI SOLEH
atas nomor togelnya yang kemarin AKI berikan "4D"
alhamdulillah ternyata itu benar2 tembus AKI
dan berkat bantuan AKI SOLEH saya bisa melunasi semua hutan2…
orang tua saya yang ada di BANK BRI dan bukan hanya itu AKI alhamdulillah,
sekarang saya sudah bisa bermodal sedikit untuk mencukupi kebutuhan keluarga saya sehari2.
Itu semua berkat bantuan AKI SOLEH sekali lagi makasih banyak ya, AKI
yang ingin merubah nasib
seperti saya...?
SILAHKAN GABUNG SAMA AKI SOLEH No; { 082-313-336-747 }
Sebelum Gabung Sama AKI Baca Duluh Kata2 Yang Dibawah Ini
Apakah anda termasuk dalam kategori di bawah ini...!!
1: Di kejar2 tagihan hutang..
2: Selaluh kalah dalam bermain togel
3: Barang berharga sudah
terjual buat judi togel..
4: Sudah kemana2 tapi tidak
menghasilkan, solusi yang tepat..!
5: Sudah banyak dukun ditempati minta angka ritual blom dapat juga,
satu jalan menyelesaikan masalah anda..
Dijamin anda akan berhasil
silahkan buktikan sendiri
Atau Chat/Tlpn di WhatsApp (WA)
No WA Aki : 082313336747
TERIMA KASIH YANG PUNYA
ROOM ATAS TUMPANGANYA SALAM KOMPAK SELALU
"KLIK DISINI BOCORAN TOGEL SGP HK SDY DAN DLL"
I recently wrote a blog post about this where I wrap this up into a package called `reports`: http://trinkerrstuff.wordpress.com/2013/02/24/workflow-w-reports-package/ I believe we're working on very similar ideas at the same time. In fact I now look and see our blog posts went to Rbloggers the same day.
ReplyDeleteHi Tyler, I haven't had a play with 'reports' just yet, but I have found your qdap package extremely useful. Do you have any ambitions of scaling it up to be more friendly for languages other than English? I keep finding in my work that half the tests I might want to do are dependent on English language dictionaries/lists/syntax. Integration with a translation api would go a long way to solve this! Cheers, R
DeleteHey Rolf I just saw you commented back (2 months later) sorry about that. Rolf right now qdap is definitely geared toward English. I'd love to see integration but I don't think that's possible. Right now I don't think it is because every language functions so differently from another. Many of the algorithms use an English specific algorithm to compute descriptives. Integration would require a team of people with heavy duty skills, language abilities, and logic beyond my own. This would essentially require a complete rewrite of 1/3 of qdap's functions. In my own work I have encountered a need to extend qdap to Korean but lack the knowledge of the language to even understand if my coding is correct. I am very open to a team of people making this a reality but do not think this is likely.
DeleteIm having some issues with getting images into the pdf, latex, docx from pandoc. I can knit up my rmd files nicely and they put the images into the html as raw data. I can even not do that and verify that the *.md file links to the cunk-*-.png files as necessary but none of the pandoc output have images embedded in them. Any suggestions?
ReplyDeleteR
Sorry to have ignored your comment for so long. I have found pandoc to do well with R generated graphics, but that for src links they have to be stored in the root folder and linked appropriately. This is what I did the other day when I encountered this problem: copied my images into the figure folder generated by R in the root, and used this notation:
Delete
Hope that works for you, best, R
It is easy to get started with Word, but it will be hard (or simply hell) to maintain, unless your colleagues promise not to touch the Word document when they read it. I had the pandoc integration in mind since the very beginning of knitr: https://github.com/yihui/knitr/issues/206 and I'm still thinking what would be the ideal approach. Hopefully there will be something interesting coming out in the near future. We are working on it.
ReplyDeleteBTW, if the results are read only, I do not quite understand how Word could be an advantage for reading purposes. Isn't PDF way better? :)
Thanks for the comment, Yihui! I must admit my need for Word integration is not really compatible with proper reproducible research - more that just so many people express a strong preference for seeing word files, all from article submissions, to colleagues making 'track changes' style alterations. I also keep finding that I need multi language spell checking options, but that's just my research. If there was any way of getting these changes back into the Rmd file that would be great, but at the moment it's manual labour for me on that count!
DeleteThanks for this article! I invested a lot of time into learning LaTex as part of literate programming principles (in Emacs first, to boot). I have--reluctantly, at first--switched to RStudio and markdown (via knitr--Yay! knitr). I really appreciate the advice on how to integrate pandoc. I need to produce long reports (with internal links) with the option to publish as PDF or html AND Word. If I want my supervisor's blessing to invest time into the literate programming approach, it needs to be really smooth. In my field, clinical epidemiology, manuscripts are submitted to journals as Word files. Also, my colleagues do not work with Linux and are not comfortable with a wide variety of scripting languages. Installing new software on office PCs means going through IT... I want to create a folder that I can archive for others that contains finished documents in whatever format they want and that allows them to recreate documents from the source files as simply as possible. It's pretty important that they have to install as little new software as possible, e.g. RStudio plus the required packages, and do everything from within RStudio. (It's nice for me, too, because I'm not much more advanced in computing.) I feel a little uneasy that my supervisor can change the Word report and it won't be reflected in the source files, but that's how she works and I'm there to help. I think I will sell literate programming better by being accommodating than being purist.
ReplyDeleteIn any case, I am grateful to you R gurus!
Tanya
Hi Tanya, I have the same situation with my supervisor, which is why I looked into Pandoc in the first place. I just cant think of a good way to get changes back into the Rmd file. Possibly by saving as a txt file and finding some clever way to substitute edited text chunks. It's not high on my todo list right now, though! And Yihui is sure to come up with something much cleverer soon, anyway =) Best, R
DeleteHi, Rolf, I've just discovered this blog, and it's kind of a godsend for my own dissertation research (which also involves topic modeling "medium-sized" data that R was choking on). Thank you for being so generous in sharing what you've discovered!
DeleteI haven't used knitr (yet), so I'm not sure if this applies, but it sounds like you're looking for a way to have colleagues comment on and/or edit the markdown that R outputs, without having to lose the aspects of markdown that make you want to use it in the first place. Is that right? If so, I wonder whether Nate Kontny's *Draft* might be a useful bridge. You can read the feature set at http://docs.withdraft.com/. Hope that helps!
-Ben
Thank you Ben, both for the kind words and the link to 'Draft'. I use Knitr less these days, and Draft looks like a very useful tool. Good luck with the dissertation!
DeleteRolf
Thanks for the nice post! I have been using knitr for my data analyses, but was hesitant for using it for writing anything more substantial because I did not know how to incorporate references before I found this.
ReplyDeleteOne piece of advice. The `knit` function has the optional argument `envir`. By default the RMarkdown file is knit in the parent environment. Thus variables defined in the RMarkdown file can clash with variables in the parent environment. For example, if `demo.Rmd` assigns a value to the variable `name`, your next line using pandoc would not work. The solution is to set `envir = new.env()` when calling `knit`. This way no matter what variables your blog readers use in their RMarkdown files, your code would still work.
Thanks for the great post, and an excellent blog in general.
ReplyDelete-Matthew A. Simonson PhD
Many online gambling games.
ReplyDeleteRuby888 Is a website that offers online casino games that provide one of the most realistic and popular gambling games of gamblers in Thailand. Free Online Games Betting Bets new games that update with us here. Online gambling is a popular choice among modern gamblers. Online risk exposure is more convenient by using online services. Unlike before, to travel to the country. Playing online gambling games can be easy to play, with no complicated procedure. Meet the needs of gamblers well with many online gambling sites to choose from. It also gives you the thrill of online gambling that challenges you all the way to a game against the game as well. Have fun with our 24/7 customer service. Join us today and win some great prizes. Every day, online games do not disappoint you. We are ready to serve you here. IBCbet
I really appreciate your professional approach.These are pieces of very useful information that will be of great use for me in future.
ReplyDeleteดูหนังออนไลน์
You should just take a look at bob casino review and get a 100% opportunity to make some cash.
ReplyDeleteLooking for the best sites to play online with real money withdrawal? Here is free offline casino games for you
ReplyDelete먹튀검증커뮤니티 중 먹튀검증 은 말그대로 먹튀를 검증해준다는 말이란걸 다들 알고계실겁니다. 커뮤니티 많이쓰는 단어지만 정말 다양하게 적용될수있는 표현할수 있는 해석은 두가지 정도로 나눠 집니다. 먹튀검증커뮤니티 를 해석해보면 먹튀검증 업체를 한 그룹으로 가르키는 말을 뜻합니다 먹튀
ReplyDeleteIt is interesting to read your blog post and I am going to share it with my friends.
ReplyDelete먹튀검증업체
good idea,i like this image,i think i can do something by this
ReplyDelete먹튀사이트
My Friend Recommended This Blog And He Was Totally Right Keep Up The Good Work
ReplyDelete먹튀 검증 커뮤니티
Amazing Article,Really useful information to all So, I hope you will share more information to be check and share here.
ReplyDeleteinplant training
inplant training chennai
inplant training meaning
inplant training certificate
inplant training report
report for inplant training
inplant training certificate format
inplant training meaning in tamil
what is inplant training
inplant training in chennai for mechanical
Amazing Article, Really useful information to all So, I hope you will share more information to be check and share here.
ReplyDeleteflask in python
how to install flask in python
what is flask in python
flask in python tutorial
how to create a web page using flask in python
rest api using flask in python
how to install flask in python without pip
flask in python is used for
what is flask in python used for
learn flask in python
Thankyou so much for sharing this info
ReplyDeletewedding Photographer in Ahmedabad
wedding Photographer in Bhopal
Dooh in India
Good post thanks for share information.
ReplyDeleteiteducation learning
highe ducation here
iteducation course
first education info
itlesson education