13

I'm attempting to make my code more modular: data loading and cleaning in one script, analysis in another, etc. If I were using R scripts, this would be a simple matter of calling source on data_setup.R inside analysis.R, but I'd like to document the decisions I'm making in an Rmarkdown document for both data setup and analysis. So I'm trying to write some sort of source_rmd function that will allow me to source the code from data_setup.Rmd into analysis.Rmd.

What I've tried so far:

The answer to How to source R Markdown file like `source('myfile.r')`? doesn't work if there are any repeated chunk names (a problem since the chunk named setup has special behavior in Rstudio's notebook handling). How to combine two RMarkdown (.Rmd) files into a single output? wants to combine entire documents, not just the code from one, and also requires unique chunk names. I've tried using knit_expand as recommended in Generate Dynamic R Markdown Blocks, but I have to name chunks with variables in double curly-braces, and I'd really like a way to make this easy for my colaborators to use as well. And using knit_child as recommended in How to nest knit calls to fix duplicate chunk label errors? still gives me duplicate label errors.

Community
  • 1
  • 1
Empiromancer
  • 3,778
  • 1
  • 22
  • 53
  • One trick to avoid duplicate label errors is to simply not use labels. This is what people usually do when using child documents. – Brandon Bertelsen Jan 31 '17 at 16:30
  • @BrandonBertelsen It wouldn't be too much of an issue for _me_ to adjust my workflow, but I'm also trying to provide this function to my colaborators to make it easy for them to code with better workflow as well. Thus, ideally I'm looking for some way to make this robust to all sorts of things someone might do to an Rmd document. If I make it too difficult to use, I worry people just won't seperate their files at all. (Also, R notebooks treat the chunk named `setup` differently, so I'll want one of those in each document I write for when it runs standalone). – Empiromancer Jan 31 '17 at 16:35
  • I don't know of a workaround. I myself have the same problem. I usually leave my child documents without a YAML header and without chunk labels. Maybe that's part of your process of putting it together (stripping chunk labels and YAML). It would be nice to be able to have child documents fully reproducible themselves as well. – Brandon Bertelsen Jan 31 '17 at 16:40
  • @BrandonBertelsen As it turns out, Yuhui added the ability to handle duplicate chunk labels as an option, which circumvents the major problem with many of these solutions. – Empiromancer Jan 31 '17 at 16:57

1 Answers1

16

After some further searching, I've found a solution. There is a package option in knitr that can be set to change the behavior for handling duplicate chunks, appending a number after their label rather than failing with an error. See https://github.com/yihui/knitr/issues/957.

To set this option, use options(knitr.duplicate.label = 'allow').

For the sake of completeness, the full code for the function I've written is

source_rmd <- function(file, local = FALSE, ...){
  options(knitr.duplicate.label = 'allow')

  tempR <- tempfile(tmpdir = ".", fileext = ".R")
  on.exit(unlink(tempR))
  knitr::purl(file, output=tempR, quiet = TRUE)

  envir <- globalenv()
  source(tempR, local = envir, ...)
}
Empiromancer
  • 3,778
  • 1
  • 22
  • 53
  • This is fantastically helpful. – Daniel Yudkin Oct 13 '20 at 00:09
  • I posted a gist here with a related version but that allows specification of chunks by label: https://gist.github.com/brshallo/e963b9dca5e4e1ab12ec6348b135362e – Bryan Shalloway Apr 01 '21 at 06:33
  • A nice variation / addition to this would be to be able to limit the call to read specific chunks of an RMD document. e.g. I have a notebook that contains 5 chunks, but I only want to run chunks 1 and 3 in to the script I am calling it from. – Brisbane Pom Jun 29 '21 at 00:02