7

I'm trying to write a report using the rmarkdown package and as it is, unfortunately, customary in my field reports are often submitted as MS Word documents. So I can't always rely on the power of LaTeX and have to be able convert my .Rmd to MS Word. Now, because I want to be able to create PDF and MS Word files from the same source file, I'm trying to find a general way to do this. I've got PDF working using the apa6 LaTeX-document class. The .Rmd will look something like this when creating a Word file:

---
title: My title
abstract: This is the abstract.
author: John Doe
affiliation: Unknown
note: Nothing to say.

output:
  word_document:
    reference_docx: myreference.docx
---

Lorem ipsum.

I can create a Word document from this but for obvious reasons my custom yaml-variables (e.g. abstract) will not be rendered in the document.

So basically, my problem is the following:

When creating a word document, how can I add a title page (including author names, affiliations, author notes, etc.) and another page with just the abstract before the document body ("Lorem ipsum")? The focus here is not to create pagebreaks (there are other open questions on this), but rather **is there a way to make pandoc use the custom yaml variables place them at the beginning of the document and assign styles to them?

The rmarkdown package provides an include() function but it only works with HTML and PDF documents.

Community
  • 1
  • 1
crsh
  • 1,699
  • 16
  • 33
  • 1
    have you considered hacking together something with http://en.nothingisreal.com/wiki/GPP (Linux, MacOS only, unfortunately) – Ben Bolker Jan 02 '15 at 18:48
  • Hi Ben, thanks for the hint; I'll take a closer look at GPP. Eventually, I would prefer a solution that works on all operating systems, though. – crsh Jan 02 '15 at 19:17
  • 2
    I think this will be hard -- you're either going to end up re-implementing stuff in R to generate customized markdown (i.e. re-implementing the processing of the YAML markup), or finding some way to hack the pandoc generation of Word files. (This might be more of a `pandoc` question ...) http://stackoverflow.com/questions/15937631/rstudio-knitr-pandoc-word-how-do-i-get-a-new-page-in-my-docx ; http://www.surefoss.org/publishing-publizieren/all-you-need-is-text-markdown-via-pandoc-for-academia/ – Ben Bolker Jan 02 '15 at 19:36
  • 3
    pandoc, and therefore rmarkdown, [let you specify a template file](http://hackademic.postach.io/pandoc-and-academic-docx-files), where you could define your own formatting. I'm not sure if that could go as far as placing content on separate pages though – baptiste Jan 02 '15 at 22:32
  • Last I checked (and I did try many months ago), it does not take *content* from the file, just specific styles, some stylesheets, and document properties. If you can change one of the [specifically named styles](http://pandoc.org/README.html#options-affecting-specific-writers) (look for `--reference-docx` to do what you want, that might work. – r2evans Jan 03 '15 at 05:28
  • @baptiste I think, generally, it would be possible to use reference-file styles to break a page after a paragraph to create a title page. However, AFAIK there are only 8 styles that pandoc regards and I need all of these for regular formatting. And so far I have been unable to find a way to incorporate the custom yaml-variables and assign a style to them... – crsh Jan 04 '15 at 14:25
  • I have updated the question to make what I'm trying to do more explicit. – crsh Jan 04 '15 at 14:35
  • Okay, I looked at the `pandoc` documentation and there are more than 8 styles: "Normal, Compact, Title, Subtitle, Authors, Date, Abstract, Heading 1, Heading 2, Heading 3, Heading 4, Heading 5, Block Quote, Definition Term, Definition, Bibliography, Body Text, Table Caption, Image Caption; [character] Default Paragraph Font, Body Text Char, Verbatim Char, Footnote Ref, Link." This is interesting but unfortunately doesn't answer the question. – crsh Jan 04 '15 at 14:38
  • 1
    I think you cannot expect _custom_ variables to work at this stage, since the template is currently ignored (beyond styles). However those parameters already defined (abstract should be one of them) should work. If they don't, maybe check that you have the latest version of pandoc, and try running the conversion directly via pandoc to see if the issue is with rmarkdown. – baptiste Jan 04 '15 at 14:53
  • I am not sure how you can format the YAML. However, you could have a word document with a H1 class defined to have a page break before hand. Then, create your document in .Rmd with `YAML` then `abstract` then, `# My Title`. In my own document, this starts the other items on the next page. However, I cannot use any more `#` level headers in the document. – jessi Jan 27 '16 at 08:09

1 Answers1

2

I have found that it's possible to customize the content of the Markdown file (e.g. to add and modify a title page) generated by rmarkdown before submitting it to pandoc for the conversion to DOCX by using a preprocessor. Let's assume we are trying to add some information specified in a YAML parameter note just before the abstract (support for abstracts has in the meantime been added to pandoc).

To do so, we first need a preprocessor function that reads the input file and parses the YAML front matter, and customizes the input file:

my_pre_processor <- function(metadata, input_file, runtime, knit_meta, files_dir, output_dir, from) {

  # Identify YAML front matter delimiters
  input_text <- readLines(input_file, encoding = "UTF-8")
  yaml_delimiters <- grep("^(---|\\.\\.\\.)\\s*$", input_text)

  if(length(yaml_delimiters) >= 2 &&
     (yaml_delimiters[2] - yaml_delimiters[1] > 1) &&
     grepl("^---\\s*$", input_text[yaml_delimiters[1]])) {
    yaml_params <- yaml::yaml.load(paste(input_text[(yaml_delimiters[1] + 1):(yaml_delimiters[2] - 1)], collapse = "\n"))
  } else yaml_params <- NULL

  # Modify title page
  custom_lines <- c(
    "NOTE:"
    , metadata$note
    , "\n\n"
    , "# Abstract"
    , "\n"
    , metadata$abstract
    , "\n"
  )

  ## Add modified title page components after YAML front matter
  augmented_input_text <- c(custom_lines, input_text[(yaml_delimiters[2] + 1):length(input_text)])

  # Remove redundant default abstract
  yaml_params$abstract <- NULL

  # Add modifications to input file
  augmented_input_text <- c("---", yaml::as.yaml(yaml_params), "---", augmented_input_text)
  input_file_connection <- file(input_file, encoding = "UTF-8")
  writeLines(augmented_input_text, input_file_connection)
  close(input_file_connection)

  NULL
}

Now we need to define a custom format that utilizes our preprocessor:

my_word_document <- function(...) {
  config <- rmarkdown::word_document(...)

  # Preprocessor functions are adaptations from the RMarkdown package
  # (https://github.com/rstudio/rmarkdown/blob/master/R/pdf_document.R)
  pre_processor <- function(metadata, input_file, runtime, knit_meta, files_dir, output_dir, from = .from) {
    # save files dir (for generating intermediates)
    saved_files_dir <<- files_dir

    args <- my_pre_processor(metadata, input_file, runtime, knit_meta, files_dir, output_dir, from)
    args
  }

  config$pre_processor <- pre_processor
  config
}

Now, you can use the custom format when rendering R Markdown documents as follows:

rmarkdown::render("./foo/bar.Rmd", output_format = my_word_document())
crsh
  • 1,699
  • 16
  • 33