Proper R Markdown Code Organization

Question

I have been reading about R Markdown (here, here, and here) and using it to create solid reports. I would like to try to use what little code I am running to do some ad hoc analyses and turn them into more scalable data reports.

My question is rather broad: Is there a proper way to organize your code around an R Markdown project? Say, have one script that generates all of the data structures?

For example: Let's say that I have the cars data set and I have brought in commercial data on the manufacturer. What if I wanted to attach the manufacturer to the current cars data set, and then produce a separate summary table for each company using a manipulated data set cars.by.name as well as plot a certain sample using cars.import?

EDIT: Right now I have two files open. One is an R Script file that has all of the data manipulation: subsetting and re-categorizing values. And the other is the R Markdown file where I am building out text to accompany the various tables and plots of interest. When I call an object from the R Script file--like:

```{r}
table(cars.by.name$make)
```

I get an error saying Error in summary(cars.by.name$make) : object 'cars.by.name' not found

EDIT 2: I found this older thread to be helpful. Link

---
title: "Untitled"
author: "Jeb"
date: "August 4, 2015"
output: html_document
---


This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.

When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

```{r}
table(cars.by.name$make)
```  

```{r}
summary(cars)
summary(cars.by.name)
```

```{r}
table(cars.by.name)
```   
You can also embed plots, for example:

```{r, echo=FALSE}
plot(cars)
plot(cars.import)
```

Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.

What do you mean exactly by "have a separate script that generates all of the relevant objects, and then call those objects in the R Markdown script"? — Hong Ooi, Aug 05 '15 at 03:36
@jebediah I don't have experience with R markdown, but in your markdown document, could you `source` the separate script that generates the objects? — kevinsa5, Aug 05 '15 at 03:41

score 12 · Answer 1 · answered May 19 '16 at 22:29

There is a solution for this sort of problem, explained here.

Basically, if you have an .R file containing your code, there is no need to repeat the code in the .Rmd file, but you can include the code from .R file. For this to work, the chunks of code should be named in the .R file, and then can be included by name in the .Rmd file.

test.R:

## ---- chunk-1 ----
table(cars.by.name$make)

test.Rmd

Just once on top of the .Rmd file:

```{r echo=FALSE, cache= F}
knitr::read_chunk('test.R')
```

For every chunk you're including (replace chunk-1 with the label of that specific chunk in your .R file):

```{r chunk-1}
```

Note that it should be left empty (as is) and in run-time your code from .R will be brought over here and run.

yes! this was my gateway to the yihui markdown examples including this way of Including :) To make a more useful comment --- while the accepted answer addresses reusing _data objects_, this is one good way to be reusing code, including just parts of other files. For me I have started developing library files with reusable functions, and only putting the relevant ones in a Notebook - in this reusable way - is extremely helpful. — Mike M, Feb 06 '21 at 17:28

score 4 · Accepted Answer · answered Aug 05 '15 at 03:56

4

Often times, I have many reports that need to run the same code with slightly different parameters. Calling all my "stats" functions separately, generating the results and then just referencing is what I typically do. The way to do this is as follows:

---
title: "Untitled"
author: "Author"
date: "August 4, 2015"
output: html_document
---

```{r, echo=FALSE, message=FALSE}
directoryPath <- "rawPath" ##Something like /Users/userid/RDataFile
fullPath <- file.path(directoryPath,"myROutputFile.RData") 
load(fullPath)
```

Some Text, headers whatever

```{r}
summary(myStructure$value1) #Where myStructure was saved to the .RData file
```

You can save an RData file by using the save.image() command.

Hope that helps!

answered Aug 05 '15 at 03:56

user1357015

11,168
22
66
111

That is not only a solution but also a lesson for more R knowledge. Thanks! – Jebediah15 Aug 05 '15 at 04:02
I'm not sure this is in the "spirit" of R markdown. From its website: "R Markdown documents are fully reproducible" -- I would take this to imply that it is a standalone document without external dependencies. 1) I could be wrong, and 2) if it does what you want, maybe the spirit doesn't matter. Thoughts? – kevinsa5 Aug 05 '15 at 04:07
If you your .RData is generated by a script, then you can source the file rather than loading. Many of my scripts take several hours to load so constantly "compiling" them in R Markdown would be untenable. – user1357015 Aug 05 '15 at 04:37
@kevinsa5 if you include the R scripts in whatever format you upload them in, I don't see the problem. As user137015 points out, many of us have projects that take forever to run, and RMarkdown cacheing only goes so far. I usually use: `for( f in dir( intdir, pattern = "FinalResults", full.names=TRUE ) ) { load( f ) }` for bigger projects, then `save` results along the way. – Ari B. Friedman Aug 05 '15 at 05:35
@user1357015 and Ari, Good point. Thanks for the explanation. – kevinsa5 Aug 05 '15 at 05:53

Proper R Markdown Code Organization

2 Answers2

test.R:

test.Rmd

Linked