0

When I try to use future_apply with plan(multisession), it says that the package I'm trying to use doesn't exist. When I use plan(sequential) it works fine. I also get the same error when using plan(callr).

Here's the error:

Error in loadNamespace(name): there is no package called 'fuzzyjoin'

Can anyone help me figure out a solution or what's going wrong here?

I'm not sure if this is related to the future.apply package or future or globals packages as I know that they are also involved here.

Here's my code showing the issue:

library(fuzzyjoin)
library(future.apply)
#> Loading required package: future
library(dplyr)
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)


iris_mod<- iris %>%
  mutate(examplefield= Sepal.Width + Petal.Length,
         Species = as.character(Species))


iristype <- iris_mod$Species %>% unique()

plan(sequential)

test_sequential <- future_lapply(iristype, 
                               FUN = function(x) {
                                 fuzzyjoin::fuzzy_left_join(
                                   iris_mod %>% filter(Species %in% x),
                                    iris_mod, 
                                    by = c("Species"="Species",
                                           "examplefield"="Sepal.Length"),
                                    match_fun = list(`==`, `<`)
                                 )},
                               future.chunk.size= 2
)


plan(multisession)

test_multisession <- future_lapply(iristype, 
                                   FUN = function(x) {
                                     fuzzyjoin::fuzzy_left_join(
                                       iris_mod %>% filter(Species %in% x),
                                        iris_mod, 
                                        by = c("Species"="Species",
                                               "examplefield"="Sepal.Length"),
                                        match_fun = list(`==`, `<`)
                                     )},
                                   future.chunk.size=2
)
#> Error in loadNamespace(name): there is no package called 'fuzzyjoin'

Created on 2022-01-28 by the reprex package (v2.0.1)

I'm running R v4.0.3 if that's relevant.

Roger-123
  • 2,232
  • 1
  • 13
  • 33
  • 2
    two options (session, sequential) work on my computer (Mac). Are both future and future.apply up-to-date? (first hypothesis) – Guillaume Jan 29 '22 at 12:54
  • 1
    thanks for verifying! It ended up being some weird issue with my library paths not getting passed to the `future` correctly. – Roger-123 Jan 30 '22 at 21:40
  • 1
    Yes, always make sure to run `update.packages()` whenever something is not working (see my comment in the below answer). – HenrikB Feb 03 '22 at 03:55

1 Answers1

0

I ran the following code and found that the library paths weren't being passed correctly for some reason. My dirty fix was just to make sure the packages were installed on the libPath where future was looking.

install.packages("fuzzyjoin", lib= "C:/Program Files/R/R-4.0.3/library" )

Here's the code that I ran to discover my normal session and future_lapply/future session were using different library paths:

.libPaths()
# [1] "\\\\networkfileservername/Userdata/myusername/Home/R/win-library/4.0" "C:/Program Files/R/R-4.0.3/library"    


f_libs%<-% .libPaths()
print(f_libs)
# [1] "C:/Program Files/R/R-4.0.3/library"     

Roger-123
  • 2,232
  • 1
  • 13
  • 33
  • 1
    Are you by any chance running with **parallelly** 1.29.0? There was a bug fix related to library paths on MS Windows network drives in **parallelly 1.30.0** (2021-12-17) that addresses exactly this problem for `multisession` futures, cf. https://parallelly.futureverse.org/news/index.html#bug-fixes-1-30-0. – HenrikB Feb 03 '22 at 03:54
  • 1
    Another comment: in order to install to `"C:/Program Files/R/R-4.0.3/library"`, you need to run R as Administrator. I highly recommend to never ever do that, because (i) you have the power to destroy your R installation, but, even worse, (ii) a malicious R package or R script have full access to everything on your MS Windows system. You should always be able to work with your personal folder (= `.libPaths()[1]`). If you cannot do that, the right thing is to reach out, as you did here. – HenrikB Feb 03 '22 at 05:46
  • I am facing the same issue, how/where should I set my to point to instead of `.libPaths()`? @HenrikB – JJ Fantini Sep 16 '22 at 15:53
  • Can confirm the problem is reproducible there on parallelly 1.36.0, so it's not an internal bug fix issue – marine-ecologist Jul 31 '23 at 02:16