1

My analysis is often a long pipeline separated into small Rmarkdown files that I then compile with knitr. I often would like to start a new Rmd file at the end of one (or more) Rmarkdown file; specifically I would like to be able to have the results of the R environment. However, I'd like to not get a mammoth file at the end, but keep the resulting .html file from each of them separate -- they get unwieldy otherwise. (I am not however, concerned that I have to manually running each of them separately, unlike the post knitr: Knitting separate Rnw documents within an Rmd document)

For example: Develop method A on data (A.Rmd), then method B on data (B.Rmd). Now compare the results of A and B (CompareAB.Rmd). Usually A.Rmd and B.Rmd both have some long pieces I don't want to rerun, so I would also like to pull the cache of both of these, but if they haven't been run or have been changed, I'd want it to rerun it. Another example: I want to make a presentation (e.g. latex beamer) using the results of A.Rmd and B.Rmd. The idea is not just that they share common code, but that they have separate code but the files depend on each other. I give a simple example below of a parent and child file below, where I use Sys.sleep(10) in the child file to easily detect whether the parent file seems to be using the caching of the child file or recalculating it.

The obvious way I see to do this is with including as a chunk option at some point, child=c("A.Rmd","B.Rmd"). However, this has two problems. 1) It doesn't seem to pull the cache of the child files, and instead recalculates the files even if nothing has changed and 2) It puts all of the output text and plots of the child files in my file.

Another option that would pull the cache would be to use lazyLoad() on all of the cache of the child files (A.Rmd and B.Rmd), but that won't be smart to whether it's up-to-date to changes in the original files. That would definitely be a temporary hack that could work, but not a real solution. And then another work around is to save the results that I need in later files to a file (.txt or .Rdata). This is definitely a solution, but probably means I wind up rerunning the long parts of A.Rmd and B.Rmd files again anyway, because I didn't originally know I'd want a specific object later so I'd have to change the code (so the cache would rerun). It also results in a lot of ugly code writing and reading and wouldn't know if the results were up-to-date. So I thought I'd check if there is a more elegant way to do this (or perhaps this is starting to be a feature request instead).

This seems somewhat like the question in How to source R Markdown file like `source('myfile.r')`? but the question's from 3 years ago (not clear if there was a child option then), and it also seems like they aren't interested in pulling the cache down too.

Here's my example. The child file (test.Rmd) would be:

---
title: "Test1"
author: "Me"
date: "April 7, 2015"
output: html_document
---
```{r setup}
knitr::opts_chunk$set(cache=TRUE, cache.path = "child_cache/", fig.path="child_figure/")
```

Some text about child process

```{r run1}
x<-seq(1,10,length=100)
y<-rnorm(n=100)
Sys.sleep(10) #so can easily see whether running caching
plot(x,y,main="Child Plot")
```

And my 'parent' file below (testParent.Rmd) that would read in would make use of the x and y defined in the test.Rmd file above.

---
title: "TestParent"
author: "Me"
date: "April 7, 2015"
output: html_document
---
```{r setupParent}
knitr::opts_chunk$set(cache=TRUE, cache.path = "parent_cache/", fig.path="parent_figure/")
```

```{r bringInChild, child="test.Rmd"}

```

Some text about Parent process

```{r useRun}
x2<-x*1000
y2<-y*1000
plot(x2,y2,main="Parent Plot")
```
Community
  • 1
  • 1
epurdom
  • 45
  • 3
  • 1
    This sounds like a workflow similar to mine. I can't post a response now, but I suggest looking up `knit_child()` function. And run that within a chunk, and set the chunk to include=FALSE. The cache should be respected. – Kalin May 01 '15 at 04:27
  • @user29020 I tried using `knit_child()` within my chunk with the `include=FALSE` option (and got rid of the `child="test.Rmd"` option). It was erratic as to when it would detect changes in the child file. Even if I globally set cache=FALSE, it still didn't rerun the child file each time so changes in the child file were not incorporated into the parent file. Similarly, even if I separately ran the child file to update its cache, the parent file didn't catch it. Are there other settings that it needs (for the environment option of `knit_child`, for example)? – epurdom May 11 '15 at 21:14
  • I'm not sure if this applies, but pls note that any chunk in the parent with `knit_child(...)` *should not cache*. The parent actually can't "detect changes in the child file". Roughly speaking, the parent only detects changes in the text of its *own* chunks. It's important to fully "knit" the child each time and let the child do any caching it might need to do within the child itself. Since you set the parent to cache all chunks by default, the *result* of `knit_child(...)` is being cached by the parent. **Best guess solution**: Set `cache=FALSE` for each chunk that calls `knit_child(...)`. – Kalin May 11 '15 at 23:56

0 Answers0