Here issue in R scripts
I am trying to understand how would here() work in a portable way. Found it: See what works later under Final answer - TL;DR - the bottom line, here()
is not really that useful running a script.R
from commandline.
The way I understand it with help from JBGruber: here()
looks for the root directory of a project (e.g., an RStudio project, Git project or other project defined with a .here file) starting at the current working directory and moving up until it finds any project. If it doesn't find anything it falls back to using the full working directory. Which in case of a script run by cron will default to my home directory. One could, of course, pass directory as a parameter via cron command, but it is rather cumbersome. Below answers provide good explanations and I have summarised what I found most immediately useful under "Final Answer section". But make no mistake, Nicola's answer is very good and helpful too.
Original Objective - write a set of R scripts, including R-markdown .Rmd
so that I can zip the directory, send to someone else and it would run on their computer. Potentially on a very low end computer - such as RaspberryPi or old hardware running linux.
Conditions:
- can be run from commandline via
Rscript
- as above but scheduled via
cron
- main method for setting up working directory is
set_here()
- executed once from console and then the folder is portable because the.here
file is included on the zipped directory. - does not need
Rstudio
- hence do not want to do R-projects - can also be run interactively from
Rstudio
(development) - can be executed from
shiny
(I assume that will be OK if the above conditions are met)
I specifically do not want to create Rstudio projects because in my view it necessitates to install and use Rstudio, but I want my scripts to be as portable as possible and run on low resource, headless platforms.
Sample code:
Let us assume the working directory will be myGoodScripts
as follows:
/Users/john/src/myGoodScripts/
when starting development I would go to the above directory with setwd()
and execute set_here()
to create .here
file. Then there are 2 scripts dataFetcherMailer.R
, dataFetcher.Rmd
and a subdirectory bkp
:
dataFetcherMailer.R
library(here)
library(knitr)
basedir <- here()
# this is where here should give path to .here file
rmarkdown::render(paste0(basedir,"/dataFetcher.Rmd"))
# email the created report
# email_routine_with_gmailr(paste0(basedir,"dataFetcher.pdf"))
# now substituted with verification that a pdf report was created
file.exists(paste0(basedir,"/dataFetcher.pdf"))
dataFetcher.Rmd
---
title: "Data collection control report"
author: "HAL"
date: "`r Sys.Date()`"
output: pdf_document
---
```{r setup, include=FALSE}
library(knitr)
library(here)
basedir <- here()
# in actual program this reads data from a changing online data source
df.main <- mtcars
# data backup
datestamp <- format(Sys.time(),format="%Y-%m-%d_%H-%M")
backupName <- paste0(basedir,"/bkp/dataBackup_",datestamp,"csv.gz")
write.csv(df.main, gzfile(backupName))
```
# This is data collection report
Yesterday's data total records: `r nrow(df.main)`.
The basedir was `r basedir`
The current directory is `r getwd()`
The here path is `r here()`
The last 3 lines in the report would be matching, I guess. Even if getwd()
does not match the other two, it should not matter, because here()
would ensure an absolute basepath.
Errors
Of course - the above does not work. It only works if I execute Rscript ./dataFetcherMailer.R
from the same myGoodScripts/
directory.
My aim is to understand how to execute the scripts so that relative paths are resolved relative to the script's location and the script can be run from commandline independent of the current working directory. I now can run this from bash only if I have done cd
to the directory containing the script. If I schedule cron
to execute the script the default working directory would be /home/user
and script fails. My naive approach that regardless of the shell's current working directory basedir <- here()
should give a filesystem point from which relative paths could be resolved is not working.
From Rstudio without prior setwd()
here() starts at /home/user
Error in abs_path(input) :
The file '/home/user/dataFetcher.Rmd' does not exist.
From bash with Rscript
if cwd not set to the script directory.
$ cd /home/user/scrc
$ Rscript ./myGoodScripts/dataFetcherMailer.R
here() starts at /home/user/src
Error in abs_path(input) :
The file '/home/user/src/dataFetcher.Rmd' does not exist.
Calls: <Anonymous> -> setwd -> dirname -> abs_path
If someone could help me understand and resolve this problem, that would be fantastic. If another reliable method to set basepath without here()
exists, I would love to know. Ultimately executing script from Rstudio
matters a lot less than understanding how to execute such scripts from commandline/cron
.
Update since JBGruber answer:
I modified the function a little so that it could return either filename or directory for the file. I am currently trying to modify it so that it would work when .Rmd
file is knitted from Rstudio and equally run via R file.
here2 <- function(type = 'dir') {
args <- commandArgs(trailingOnly = FALSE)
if ("RStudio" %in% args) {
filepath <- rstudioapi::getActiveDocumentContext()$path
} else if ("interactive" %in% args) {
file_arg <- "--file="
filepath <- sub(file_arg, "", grep(file_arg, args, value = TRUE))
} else if ("--slave" %in% args) {
string <- args[6]
mBtwSquotes <- "(?<=')[^']*[^']*(?=')"
filepath <- regmatches(string,regexpr(mBtwSquotes,string,perl = T))
} else if (pmatch("--file=" ,args)) {
file_arg <- "--file="
filepath <- sub(file_arg, "", grep(file_arg, args, value = TRUE))
} else {
if (type == 'dir') {
filepath <- '.'
return(filepath)
} else {
filepath <- "error"
return(filepath)
}
}
if (type == 'dir') {
filepath <- dirname(filepath)
}
return(filepath)
}
I discovered however that commandArgs()
are inherited from the R script i.e. they remain the same for the .Rmd
document when it is knit from a script.R
. Therefore only the basepath
from script.R
location can be used universally, not file name. In other words this function when placed in a .Rmd
file will point towards the calling script.R
path not the .Rmd
file path.
Final answer (TL;DR)
The shorter version of this function will therefore be more useful:
here2 <- function() {
args <- commandArgs(trailingOnly = FALSE)
if ("RStudio" %in% args) {
# R script called from Rstudio with "source file button"
filepath <- rstudioapi::getActiveDocumentContext()$path
} else if ("--slave" %in% args) {
# Rmd file called from Rstudio with "knit button"
# (if we placed this function in a .Rmd file)
file_arg <- "rmarkdown::render"
string <- grep(file_arg, args, value = TRUE)
mBtwQuotes <- "(?<=')[^']*[^']*(?=')"
filepath <- regmatches(string,regexpr(mBtwQuotes,string,perl = T))
} else if ((sum(grepl("--file=" ,args))) >0) {
# called in some other way that passes --file= argument
# R script called via cron or commandline using Rscript
file_arg <- "--file="
filepath <- sub(file_arg, "", grep(file_arg, args, value = TRUE))
} else if (sum(grepl("rmarkdown::render" ,args)) >0 ) {
# Rmd file called to render from commandline with
# Rscript -e 'rmarkdown::render("RmdFileName")'
file_arg <- "rmarkdown::render"
string <- grep(file_arg, args, value = TRUE)
mBtwQuotes <- "(?<=\")[^\"]*[^\"]*(?=\")"
filepath <- regmatches(string,regexpr(mBtwQuotes,string,perl = T))
} else {
# we do not know what is happening; taking a chance; could have error later
filepath <- normalizePath(".")
return(filepath)
}
filepath <- dirname(filepath)
return(filepath)
}
NB: from within .Rmd
file to get to the containing directory of the file it is enough to call normalizePath(".")
- which works whether you call the .Rmd
file from a script, commandline or from Rstudio.