4

A while back I was reading an article about improving project workflow. The advice was not to use setwd or my computer would burn:

If the first line of your R script is

setwd("C:\Users\jenny\path\that\only\I\have")

I will come into your office and SET YOUR COMPUTER ON FIRE .

I started using the here package and it worked great until I started to schedule scripts using cronR. After asking this question my laptop was again threatened with arson:

If the first line of your #rstats script is wd <- here(), I will come into your lab and SET YOUR COMPUTER ON FIRE.

Fearing for my laptop's safety I started using the method suggested in the answer to get relative file paths:

wd <- Sys.getenv("HOME")
wd <- file.path(wd, "projects", "my_proj") 

Which worked for me but not people I was working with who didn't have the same projects directory. So now I'm confused. What is the safest / best way get relative file paths so that a project can be portable?

There are quite a few options: 1, 2. My requirements are to source functions/scripts and read/write csv files. Perhaps the rprojroot package is the best bet?

Pete900
  • 2,016
  • 1
  • 21
  • 44
  • 2
    I didn't visit links 1 or 2 but `rprojroot` or `here` are my go-to options for _projects_ (provided you've got one of the "markers" in the project directory (and you use projects like you should :-) With regard to scripts, the environment variable approach is pretty common for bash, python, etc scripts and is a fine option for R. For some automation tasks I have a "jobs" folder where the R scripts are and a "conf" folder where Renviron-like files sit and I `readRenviron` the associated one right at the top of the script. But for sharing projects, I see nothing wrong with `here::here()` – hrbrmstr Nov 22 '18 at 15:27
  • Create an environment variable for the root folder of each project you want to share in other ways than merely an interactive rstudio project. –  Nov 25 '18 at 18:30

4 Answers4

3

Create an RStudio project and then reference all files with relative paths from the project's root folder. That way, all users will open the project and automatically have the correct working directory.

Felipe Gerard
  • 1,552
  • 13
  • 23
  • What is an "R project"? – Ista Nov 22 '18 at 16:19
  • @Ista - see [R projects in RStudio](https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects) for details. – Len Greski Nov 22 '18 at 16:48
  • 1
    @LenGreski the article you linked to never uses the phrase "R project" because "R project" is not a thing. "Rstudio project" is a thing, but Rstudio is not R, and "Rstudio project" is not "R project". – Ista Nov 22 '18 at 17:01
  • 1
    @ista - I didn't intend to start an ontological debate here, I just explained what I thought Felipe intended by his post. I understand that projects are an RStudio feature, since the URL I posted refers to the RStudio website. – Len Greski Nov 22 '18 at 20:02
  • 1
    Thank you @LenGreski, that is indeed what I meant to say. I edited my answer to reflect this. – Felipe Gerard Nov 26 '18 at 19:00
2

There are many ways to organize code and data for use with R. Given that the "arsonist" described in the OP has rejected at least two approaches for locating the project files in an R script, the best next step is to ask the arsonist how s/he performs this function, and adjust your code and file structures accordingly.

UPDATE: Since the "arsonists" appear to be someone who writes on Tidyverse.org (see Tidyverse article in OP) and an answer on SO (see additional links in OP), your computer appears to be relatively safe.

If you are sharing code or executing it with batch processes where the "user" is someone other than you, a useful approach is to place the code, data, and configuration under version control, and develop a runbook to explain how others can retrieve the components and execute them on another computer.

As noted in the comments to the OP, there's nothing wrong with here::here() if its use can be made reliable through documentation in a runbook.

I structure all of my R code into Projects within RStudio, which are organized into a gitrepositories directory. All of the projects can be accessed as subdirectories from the gitrepositories directory. If I need to share a project, I make the project accessible to other users on GitHub.

In my R code I reference external files as subdirectories from the project root directory, such as ./data/gen01.csv.

Len Greski
  • 10,505
  • 2
  • 22
  • 33
  • Thanks. Is a "runbook" something specific or just a README equivalent? – Pete900 Nov 22 '18 at 16:51
  • 1
    You're welcome, @Pete900. A [runbook](https://en.wikipedia.org/wiki/Runbook) is a set of procedures that is typically carried out by a technology Operations department. A README can be used as the place where a runbook is documented. Here is a very simple example that I wrote a few years ago for the Johns Hopkins *Getting and Cleaning Data* [final assignment](https://github.com/lgreski/cleaningdata#runscript). – Len Greski Nov 22 '18 at 16:57
2

There are two parts to this question:

  1. how to load data from a relative path, and
  2. how to load code from a relative path

For most use cases (including when invoking tools from a CRON job or similar) the location of the data should either be specified by the user (via command line arguments, standard input or environment variables) or should be relative to the current working directory (getwd() in R).

… Unless the data is a fixed part of the project itself — more on this below.

Loading code from a path that’s relative to other code is simply not supported by base R. For example, source('xyz.r') won’t source an xyz.r file from the project. It will always try to load it from the current working directory, whatever that happens to be. Which is pretty much never what you want. And as you’ve noticed, the ‘here’ package also doesn’t always work.

R basically only works when code is only loaded from packages. But packages aren’t suitable for all types of projects. R has no built-in solution for those other cases. I recommend using ‘box’ modules to solve this. ‘box’ provides a modern module system for R, which means that you can have R projects consisting of multiple code files (and nested sub-projects), without having to wrap them in packages. Loading code inside the same relative path in a module is as simple as

box::use(./xyz)

This always works, as you’d expect from a modern module system, and doesn’t require ‘here’ or similar hacks.

OK, back to the point about data that’s bundled with a project itself. If your project is an R package, you’d use system.file() to load that data. However, this once again doesn’t work for non-package projects. But if you use ‘box’ modules to structure your project, you can use box::file() to load data that’s associated with a module.

Packages such as ‘here’ or ‘rprojroot’, while well-intended, are essentially hacks to work around limitations in R’s handling of non-package code. The proper solution is to make non-package code into a first-class citizen of the R world, and ‘box’ does that.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • 1
    Disclaimer: I’m the author of ‘box’. However, in my professional opinion it really is the only non-hacky solution to maintaining large non-package code bases in R because R fundamentally doesn’t cater for this use-case and lacks crucial infrastructure which ‘box’ adds. – Konrad Rudolph Apr 24 '22 at 21:21
0

You can check docs of RSuite package (https://RSuite.io). It is working with script_path that points to currently run R script. I use it to make relative paths using 'file.path' command