2

Finding out about the drake package was one of the best recent discoveries as an R user. However, one drawback I see with the package in terms of reproducibility is the cluttering of the workspace with functions that are merely helper functions.

No one knows whether these sourced functions clash, or if the order of library calls matters. I know there is the conflictedpackage, but it only deals with packages. I know the code unit in R should be a package, but it seems strange to have an analysis with a handful of files like preprocessing.R, training.R and turn them into a package. Potential name clashes begin quite early anyway, and I've never seen anyone presenting a clean approach for R.

There is however the importpackage which allows for cherry picking the import of package functions and functions/variables from other files. Say you have function a in a.R, then importing it using import the function is accessible, but all of its dependencies are available to the function a but not imported, providing useful isolation.

I tested using the import package with drake, but drake does not detect if the dependencies of imported functions change, breaking it's actual use case. Does anyone know a way to tell drake to "drill down" on these functions, or any other way to make it work? Thanks in advance!

telegott
  • 196
  • 1
  • 10

1 Answers1

1

By design, drake only tracks the functions in the environment of make(), which you can set with the envir argument (plus namespaced functions called with pkg::fun(), but it was a mistake to build that capability). envir is just the calling environment by default (parent.frame()). So when you use import::from(), be sure to set .into equal to "" so it brings stuff into drake's environment.

ls()
#> character(0)
import::from(dplyr, mutate, .into = "")
ls()
#> [1] "mutate"
library(drake)
plan <- drake_plan(x = mutate(mtcars, x = 1))
vis_drake_graph(plan)

Created on 2020-09-05 by the reprex package (v0.3.0)

Incidentally, you just handed us an excellent alternative to envir = getNamespace("yourPackage") from https://github.com/ropensci/drake/issues/1286#issuecomment-649088321, the latter of which is limited if you want to pull functions from multiple sources. So thanks! Let's spread the word about this workaround.

landau
  • 5,636
  • 1
  • 22
  • 50
  • Thanks alot for your answer!I added a full example here: https://github.com/telegott/r-playground the issue is that the functions that are _not_ imported from `helpers.R` are not monitored by `drake`. So if I change the number that the function that is not imported outputs, `drake` still reports that everything is up to date. I assume there's no easy workaround, since these functions never and should not appear in the global environment, so `drake` would need to know about the special meaning of `import::here` – telegott Sep 06 '20 at 10:31
  • You could recurse over the top-level function, using `codetools::findGlobals()` to detect the nested functions at each stage. Or submit a feature request to `import` to make that happen natively. Otherwise, this will require extra work. You could `source()` all the scripts, use `vis_drake_graph()` to detect the nested function dependencies, and move all those into a tidier R script. The `Rclean` package tries to automate this for script-based workflows. – landau Sep 06 '20 at 13:20
  • Organizing code is deliberately outside `drake`'s scope, and I do not consider it a shortcoming of the tool that it is still possible for the user to write messy code. Whenever we write code, we all have to deal with the task of organizing and naming things. It's a skill we can get better at. – landau Sep 06 '20 at 13:26
  • In `drake`'s source (along with the source of its successor, [`targets`](https://github.com/wlandau/targets)) every top-level function gets its own script file which contains all its helpers, and each helper get an informative prefix to indicate the top-level functions it supports. Helpers widely used over many top-level functions go in `utils-*.R` scripts. – landau Sep 06 '20 at 13:29
  • thanks for the suggestion with the prefix, that definitely helps with organization. So in short, one would need to modify `drake` to know that local functions which are imported can depend on other local functions which are not imported – telegott Sep 13 '20 at 18:31
  • Each function that drake imports somehow needs to make it into ‘envir’. – landau Sep 14 '20 at 00:22