Knit Rmd to DOCx

Rich Jones

2021-05-09

Although it is very convenient to knit Rmd files to html, it is very likely that the data analyses are part of a team effort and sharing descriptive text and results are likely needed by collaborators for including in a research project proposal or a manuscript draft. Microsoft Word (docx) files are the lingua franca for proposals and manuscripts.

This chapter contains some suggestions for knitting Rmd analysis files to DOCx files.

YAML

Example YAML

---
title: "Goldstein R01: Data analysis and statistical power considerations"
author: "Rich Jones"
date: "8 May 2021"
output: 
  word_document:
     reference_docx: "reference_analysis_report_2021-05-09.docx"
---

The example YAML headder will knit to a Word document and use a specific reference document for formatting the word document. In the code above the Rmd is being knit in the analysis report format.

Reference docx (this is about the format of the docx that is generated)

I have prepared two reference docx files. These are word documents generated by knitting an RMD to docx, and then modifying the styles and page set-up to match a desired look-and-feel. The reference docx is mentioned in the YAML above. Both of these are similar: they are based on Arial 11 point typeface, and minimize white space.

reference docxs overview

Some day I will prepare one for manuscript sections that looks just like reference_analysis_report_2021-05-09.docx except line spacing will be double. Might need a version of these in Arial and Times New Roman typeface.

Code blocks

Try to prepare your code so that automatic wrapping is minimized

  • Using reference_NIH_continuation_page.docx lines of code will wrap to the next line if they are more than 89 columns
  • Using reference_analysis_report_2021-05-09.docx lines of code will wrap if they are more than 77 columns

Inserting references (this is about citations to the literature)

There are basically two ways to format your references. One is to have pandoc do it, the other is to have whatever reference manager you use with Microsoft Word do it, for example Endnote. If you are going to be sharing the word doc with collaborators, the only reasonable choice is to use Endnote, which is probably the most commonly used reference management software. There are free and open source alternatives, but if your collaborators use Endnote you should be using Endnote.

To insert references using Endnote, simply copy the reference into your markdown like you would if using Microsoft Word interactively. I have my Endnote citations formatted so that the citation is included in curly brackets, and this does not seem to offer any conflict with Markdown.

Example:

 Due to the potential multiplicity in hypothesis testing 
 (aim 2), we will use the Benjamini-Yekutieli false 
 discovery rate limiting procedure procedure to control 
 for false discovery {Benjamini, 2001 #9683}. Under the 
 assumption that, conditional on data that are observed, 
 the mechanisms underlying the generation of missing data 
 are unrelated to values that would have been observed, 
 had they been observed, we will use maximum likelihood 
 and Bayesian parameter estimation to obtained unbiased 
 parameter estimates in the presence of missing data 
 {Enders, 2010 #9522}.

Then, after you’ve generated the docx file, you can format the references from within Word.

If you do want to use Pandoc to format your references, there are multiple ways of doing this, but the kind I find least painful is:

  1. Prepare a Bibtex.bib file (see example_bibliography.bib for an example)
  2. Choose a good format file (for a numbered reference style, see numbered.cls. Search the web for other formats (e.g., APA))
  3. As you prepare your Rmd file and want to cite an article, go to Endnote and find the reference, select the bibtex_export style (see for example here) and copy-formatted (CMD-k) and paste in the bib file, and then type in the reference in the Rmd text.

Example

 Due to the potential multiplicity in hypothesis testing 
(aim 2), we will use the Benjamini-Yekutieli false 
discovery rate limiting procedure procedure to control 
for false discovery [@RN9683]. Under the 
assumption that, conditional on data that are observed, 
the mechanisms underlying the generation of missing data 
are unrelated to values that would have been observed, 
had they been observed, we will use maximum likelihood 
and Bayesian parameter estimation to obtained unbiased 
parameter estimates in the presence of missing data 
[@RN9522].

And modify your YAML:

---
title: "Goldstein R01: Data analysis and statistical power considerations"
author: "Rich Jones"
date: "8 May 2021"
output: 
 word_document:
    reference_docx: "reference_analysis_report_2021-05-09.docx"
bibliography: bibliography.bib
csl: Numbered.csl
---

fin knitdox