Avoid loading data every time in knitr

Question

I am creating a document using knitr and I am finding it tedious to reload the data from disk every time I parse the document while I'm in development. I've subsetted that datafile for development to shorten the load time. I also have knitr cache set to on.

I tried assigning the data to the global environment using <<-, and using exists with where=globalenv(), but that did not work.

Anyone know how to use preloaded data from the environment in knitr or have other ideas to speed up development?

score 18 · Answer 1 · answered Nov 07 '17 at 17:45

When a document is knitted, a new environment is created within R, and therefore any settings in the global environment will not be passed to the document. However, this is done intentionally, as accidentally referencing an object in the global environment is an easy thing to break a reproducible analysis, and therefore making a clean session each time means the RMarkdown file runs on its own, regardless of the global environment settings.

If you do have a use case which justifies preloading the data, there are a few things you can do.

Example Data

Firstly I have created a minimal Rmd file as below called "RenderTest.Rmd":

title: "Render"
author: "Michael Harper"
date: "7 November 2017"
output: pdf_document
---

```{r cars}
summary(cars2)
```

In this example, cars2 is a set of data I am referencing to from my global session. Run on its using the "Knit" command in RStudio, this will return the following error:

Error in summary(cars): object 'cars2' not found: ... withCallignHandlers -> withVisible -> eval -> eval -> summary Execution halted

Option 1: Manually Call the render function

The render function from rmarkdown can be called from another R script. This by default does not create a fresh environment for the script to run in, so you can use any parameters already loaded. As an example:

# Build file
library(rmarkdown)

cars2<- cars
render("RenderTest.Rmd")

I would, however, be careful doing this. Firstly, the benefit of using RMarkdown is that it makes reproducibility of the script is incredibly easy. As soon as you start using external scripts, it makes things more complicated to replicate as all the settings are not contained within the file.

Option 2: Save data to an R object

If you have some analysis which takes time to run, you can save the result of the analysis as an R object, and then you can reload the final version of the data into the session. Using my above example:

```{r dataProcess, cache = TRUE}
cars2 <- cars
save(cars2, "carsData.RData") # saves the 'cars2' dataset
```
and then we can just reload the data into the session:

```{r}
load("carsData.RData") # reloads the 'cars2' dataset
```

I prefer this technique. The chunk dataProcess is cached, so is only run if there are changes made to the code. The results are saved to file, which are then loaded by the next chunk. The data still has to be loaded into the session, but you can save the finalised dataset if you need to do any data cleaning.

Option 3: Build the file less frequently

With the updates made to RStudio over the past few years, there is less of a need to continuously rebuild the file. Chunks can be run directly within the file, and the output window viewed. It will potentially save you a lot of time trying to optimise the script, only to save a couple of minutes on compiling (which normally makes a good time to get a hot drink anyway!).

that is a great answer. At least in my running version R 4.0.3, the call to `save` requires specification of `file` argument: `save(cars2, file = "carsData.RData")` — tjebo, Mar 04 '21 at 19:14
Also, for the future reader, you might want to consider saving as .rds file instead, see discussion here: https://stackoverflow.com/questions/19967478/how-to-save-data-file-into-rdata — tjebo, Mar 04 '21 at 19:19

Avoid loading data every time in knitr

1 Answers1

Example Data

Option 1: Manually Call the render function

Option 2: Save data to an R object

Option 3: Build the file less frequently

Linked

Related