0

I am trying to set the plan I need but I am getting the following errors:

no_cores <- availableCores() - 2
plan(multisession, workers = no_cores, lazy = T, gc = T)

and the error is:

Error in MultisessionFuture(expr = expr, envir = envir, substitute = FALSE,  : 
  argument "expr" is missing, with no default

or:

plan(multisession, workers = no_cores, lazy = T, gc = T)
Error in tweak.future(function (expr, envir = parent.frame(), substitute = TRUE,  : 
  Future argument 'lazy' must not be tweaked / set via plan()

Please advise how can I set the workers, lazy, gc and other parameters of multisession/multicore plans.

My R version is:

R.Version()
$platform
[1] "x86_64-pc-linux-gnu"

$arch
[1] "x86_64"

$os
[1] "linux-gnu"

$system
[1] "x86_64, linux-gnu"

$status
[1] ""

$major
[1] "4"

$minor
[1] "0.2"

$year
[1] "2020"

$month
[1] "06"

$day
[1] "22"

$`svn rev`
[1] "78730"

$language
[1] "R"

$version.string
[1] "R version 4.0.2 (2020-06-22)"

$nickname
[1] "Taking Off Again"
SteveS
  • 3,789
  • 5
  • 30
  • 64

1 Answers1

2

Trying your example I've got the second error message. And I've basically followed the advice given: you set the argument lazy = T in plan which is not allowed. However, you can set this argument directly in the furrr function call:

library(furrr)
no_cores <- availableCores() - 2
plan(multisession, workers = no_cores, gc = T)

future_map(c("hello", "world"), ~.x,
           .options = future_options(lazy = TRUE))
[[1]]
[1] "hello"

[[2]]
[1] "world"
starja
  • 9,887
  • 1
  • 13
  • 28
  • I have a very big dataframe and for some reason I am getting ```Error in .jnew("opennlp.tools.postag.POSModel", .jcast(.jnew("java.io.FileInputStream", : java.lang.OutOfMemoryError: Java heap space``` But I have set: ```options(java.parameters = "-Xmx8000m")``` – SteveS Sep 19 '20 at 18:09
  • This sounds like you don't have enough RAM. Have you monitored the memory usage? Also, maybe it helps to check out the [advice here](https://stackoverflow.com/questions/34624002/r-error-java-lang-outofmemoryerror-java-heap-space) – starja Sep 19 '20 at 18:17
  • Maybe there are ways to split up your data, so only use a part of the df for every calculation; or depending which calculations you make do you need to always pass the complete df or only some rows/columns of it? – starja Sep 19 '20 at 18:18
  • I have 1M+ rows in my tibble and I need to process all of them using some functions. When I am using htop on Ubuntu 18.04 it shows it gets to around 7.5GB out of 16GB RAM and fails with java heap size error. If it's ok with you we can make a zoom and post it the results later here. – SteveS Sep 19 '20 at 18:27
  • have you tried setting the java options before loading rJava and maybe use more than 8GB for the java heap? sorry, I'm not available for zoom – starja Sep 19 '20 at 18:32
  • sure it's now 10gb before all the libraries loads. I mean at the start of the whole program I have the options(java.parameters = "-Xmx10000m") I have 2 Javas installed 8 and 11, and 2 rJava libraries in my packages tab, maybe this is the reason? – SteveS Sep 19 '20 at 18:35
  • There's no reproducible code but note that **rJava** objects cannot be exported to parallel workers. See https://cran.r-project.org/web/packages/future/vignettes/future-4-non-exportable-objects.html for an example. This is true regardless what parallel framework you use in R. – HenrikB Sep 21 '20 at 23:18
  • Thanks for the package and the clarification Henrik! So in order to parallelise the code you would need to initialise the rJava objects on every worker? – starja Sep 22 '20 at 07:13