Is there a way to generate a cached version of an rmarkdown document and then generate multiple outputs directly from the cache?

Question

I'm performing some computationally intensive operations that I would like to generate reports from. I'm experimenting with bookdown or straight rmarkdown. Essentially I'd like an html_document report and a word_document report.

My .Rmd file looks like this:

---
title: "My analysis"
author: "me"
date: '2019-12-17'
output:
  bookdown::word_document2:
    highlight: tango
    df_print: kable
    reference_docx: Word_template.docx
    toc: yes
    toc_depth: 2
    fig_caption: yes
  bookdown::html_document2:
    theme: yeti
    highlight: tango
    df_print: paged
    toc: yes
    toc_depth: 2
    fig_caption: yes
    keep_md: yes
---
***

```{r child = 'index.Rmd', cache=TRUE}
```

```{r child = '01-Read_in_raw_data.Rmd', cache=TRUE}
```

```{r child = '02-Add_analysis.Rmd', cache=TRUE}
```

What happens is that the html and word documents get cached separately, which is a) time-consuming because they are run twice and b) annoying due to some exported files creating problems when caching (they are generated during the first knit operation but already exist for the second and subsequent ones and generate errors).

I've tried generating just the .md file but it doesn't change problem (a) and I just get really ugly reports from .md inputs with pandoc.

Does anyone have a more elegant way of doing this?

@ricoderks: Yes, if I can't find an rmarkdown-based solution, I intend to use drake. The problem is that doing the analysis in separate .R or .Rmd files doesn't really work well with drake. — biomiha, Dec 17 '19 at 13:21

score 1 · Answer 1 · answered Dec 17 '19 at 14:22

Oh, I can feel your pain. Here's my solution. I basically don't do expensive calculations in markdown document. Instead I do them in an R document. I can then store the results and then of course reload them as well. The cool thing is now that I can use the data in the workspace to create a markdown document and then knit it.

library(rmarkdown)
library(knitr)

rmd_code <- function(){
    paste0(
        "---
title: \"My analysis\"
author: \"me\"
date: '2019-12-17'
output:
  bookdown::word_document2:
    highlight: tango
    df_print: kable
    reference_docx: Word_template.docx
    toc: yes
    toc_depth: 2
fig_caption: yes
  bookdown::html_document2:
    theme: yeti
    highlight: tango
    df_print: paged
    toc: yes
    toc_depth: 2
    fig_caption: yes
    keep_md: yes
---
***

```{r child = 'index.Rmd', cache=TRUE}
```

```{r child = '01-Read_in_raw_data.Rmd', cache=TRUE}
```

```{r child = '02-Add_analysis.Rmd', cache=TRUE}
```
"
    )
}

# write the Rmd code into a file
cat(rmd_code()
    , file = "bla.Rmd")

# knit this R-Markdown file now
render(input = "bla.Rmd"
       , output_file = "yourOutPutFile.html")

# and now delete the R-Markdown file again
file.remove("bla.Rmd")

That way it is possible to use calculations you already did work on your Rmd without rerunning all the calculations each time.

I appreciate the code however generating the 'bla.Rmd' file is less of a pain point for me than the actual rendering. I've tried using `output_format = "all"` but it still executes each one individually. Following your code takes me just as much time as before unfortunately. — biomiha, Dec 17 '19 at 15:44
Do you still do the calculations in the Rmd files (e.g. 01-Read_in_raw_data.Rmd)? Because if so, it doesn't help. You need to do the calculations in a separate document, so you have the objects in the global environment. And from there you can knit the markdown. Then you only need the time for knitting the document. — Georgery, Dec 18 '19 at 09:42

score 1 · Accepted Answer · answered Dec 23 '19 at 20:58

By default, the path to the cache database (generated by knitr) is dependent on the R Markdown output format. That is why the cache has to be regenerated for different output formats like HTML and Word. To use the same copy of the cache database for all output formats, you can manually specify a path that does not depend on the output format, e.g.,

```{r, setup, include=FALSE}
knitr::opts_chunk$set(cache.path = 'a/fixed/directory/')
```

However, please note that there is certainly a reason for why each output format uses its own cache path: the output from an R code chunk may be dependent on the output format. For example, a plot may be written out with the Markdown syntax ![](...) for Word output, but could become <img src="..." /> for HTML output. If you are sure that your code chunk doesn't have any side-effects (e.g., generating plots and tables), you are safe to use a fixed path for the cache database. Usually I would not recommend that you turn on cache = TRUE for whole documents (because caching is hard), but only cache the specific code chunks that are time-consuming.

Is there a way to generate a cached version of an rmarkdown document and then generate multiple outputs directly from the cache?

2 Answers2

Linked