6

I am running a parallelized calculation using foreach to work on a lot of time series simultaneously. Among those calculations (within a function called compute_slope() I do something like this

lBd <- floor(TMax^delta) # lower bound
uBd <-  ceiling(m * TMax^delta) # upper bound
    
# process is a tibble with columns `n` and `variance`
process %>% 
  dplyr::filter(between(n, lBd, uBd)) %>% 
  lm(data = ., log(variance) ~ log(n)) %>% 
  coefficients() %>% 
  .[2]

So, this is something pretty straightforward: With parameters TMax, delta and m I truncate a time series on the left and on the right (using filter()) and then I run a linear regression on the truncated time series. For some strange reason, most of the time everything works out nicely but sometimes (I suspect that error happens more likely for longer time series, i.e TMax is larger, but that has been sort of irregular too) I get

✖ Problem with `filter()` input `..1`.
ℹ Input `..1` is `between(n, lBd, uBd)`.
✖ `ancestor` must be an environment"

I have really no clue how to interpret this error. I also have a hard time replicating this "ancestor" error but so far no luck. For instance, I have tried

library(tidyverse)
# This is the straightforward use-case and should work (it does here)
mpg %>% filter(between(hwy, 30, 31))
#> # A tibble: 11 x 11
#>    manufacturer model    displ  year   cyl trans   drv     cty   hwy fl    class
#>    <chr>        <chr>    <dbl> <int> <int> <chr>   <chr> <int> <int> <chr> <chr>
#>  1 audi         a4         2    2008     4 manual~ f        20    31 p     comp~
#>  2 audi         a4         2    2008     4 auto(a~ f        21    30 p     comp~
#>  3 chevrolet    malibu     2.4  2008     4 auto(l~ f        22    30 r     mids~
#>  4 hyundai      sonata     2.4  2008     4 auto(l~ f        21    30 r     mids~
#>  5 hyundai      sonata     2.4  2008     4 manual~ f        21    31 r     mids~
#>  6 nissan       altima     2.5  2008     4 auto(a~ f        23    31 r     mids~
#>  7 toyota       camry      2.4  2008     4 manual~ f        21    31 r     mids~
#>  8 toyota       camry      2.4  2008     4 auto(l~ f        21    31 r     mids~
#>  9 toyota       camry s~   2.4  2008     4 manual~ f        21    31 r     comp~
#> 10 toyota       camry s~   2.4  2008     4 auto(s~ f        22    31 r     comp~
#> 11 toyota       corolla    1.8  1999     4 auto(l~ f        24    30 r     comp~

# bounds are undefined
mpg %>% filter(between(hwy, x, 31))
#> Error: Problem with `filter()` input `..1`.
#> i Input `..1` is `between(hwy, x, 31)`.
#> x object 'x' not found


# bounds are functions
mpg %>% filter(between(hwy, slice, 31))
#> Error: Problem with `filter()` input `..1`.
#> i Input `..1` is `between(hwy, slice, 31)`.
#> x cannot coerce type 'closure' to vector of type 'double'

In each case, a different (interpretable) error message was created. I suspect that the error message results from something weird happening as part of the parallel processing but I am not sure what that could be. In any case, examples for this ancestor error would be appreciated. Maybe from there I can work my way back to what goes awry in my calculations.

Update

I still cannot figure out what is going on with the parallelizations even after adding a traceback to the script. This is what it delivers

Error in { : 
  task 34 failed - "Problem with `mutate()` column `grid_estimates`.
ℹ `grid_estimates = map(data, ~estimate_var_on_grid(process = ., TMax = TMax, grid = grid))`.
✖ Problem with `mutate()` column `slope`.
ℹ `slope = map2_dbl(m, delta, ~compute_slope(process, .x, .y, TMax))`.
✖ could not find function "::""
Calls: compute_metrics_on_stable_splits ... tibble -> tibble_quos -> eval_tidy -> %dopar% -> <Anonymous>
11: (function () 
    traceback(2))()
10: stop(simpleError(msg, call = expr))
9: e$fun(obj, substitute(ex), parent.frame(), e$data)
8: foreach(i = itx, .packages = c("tidyverse", "yardstick", "rsample"), 
       .export = #vector of exports removed for legibility
) %dopar% {
       i %>% 
         pull(splits) %>% 
         .[[1]] %>% 
         train_and_test(., train_grid = grid, my_mset = my_mset, 
                   method = method, TMax = TMax_eval)
       }
   }
7: eval_tidy(xs[[j]], mask)
6: tibble_quos(xs, .rows, .name_repair)
5: tibble(metrics = .)
4: list2(...)
3: bind_cols(select(splits, alpha), .)
2: foreach(i = itx, .packages = c("tidyverse", "yardstick", "rsample"), 
       .export = #vector of exports removed for legibility
) %dopar% {
       i %>% 
         pull(splits) %>% 
         .[[1]] %>% 
         train_and_test(., train_grid = grid, my_mset = my_mset, 
                   method = method, TMax = TMax_eval)
       }
   } %>% 
     tibble(metrics = .) %>% 
     bind_cols(select(splits, alpha), .)
1: compute_metrics_on_stable_splits(method = method, grid = grid, 
       my_mset = metric_set(accuracy, mcc, sens, spec), TMax_eval = TMax_eval, 
       v = 40)

The error is now could not find function "::" which is as weird as the ancestor error. At other times I also received

'rho' must be an environment not pairlist: detected in C-level eval

Apparently, the error can be different even though the code in the script stays the same. At this point any clue would be appreciated. What is weird is that in some cases the exact same code either fails with a changing error message or sometimes completes (and if I wouldn't need to run more computations with this script, then I would already be happy with the results I get when the code finishes successfully).

Session Info

R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux 8.2 (Ootpa)

Matrix products: default
BLAS/LAPACK: /pfs/data5/software_uc2/all/toolkit/Intel_OneAPI/mkl/2021.4.0/lib/intel64/libmkl_intel_lp64.so.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
 [1] yardstick_0.0.9   doParallel_1.0.16 iterators_1.0.13  foreach_1.5.1
 [5] forcats_0.5.1     stringr_1.4.0     dplyr_1.0.7       purrr_0.3.4
 [9] readr_2.1.1       tidyr_1.1.4       tibble_3.1.6      ggplot2_3.3.5
[13] tidyverse_1.3.1

loaded via a namespace (and not attached):
 [1] tidyselect_1.1.1 haven_2.4.3      colorspace_2.0-2 vctrs_0.3.8
 [5] generics_0.1.1   utf8_1.2.2       rlang_0.4.12     pillar_1.6.4
 [9] glue_1.5.1       withr_2.4.3      DBI_1.1.1        dbplyr_2.1.1
[13] modelr_0.1.8     readxl_1.3.1     lifecycle_1.0.1  plyr_1.8.6
[17] munsell_0.5.0    gtable_0.3.0     cellranger_1.1.0 rvest_1.0.2
[21] codetools_0.2-18 tzdb_0.2.0       fansi_0.5.0      broom_0.7.10
[25] Rcpp_1.0.7       scales_1.1.1     backports_1.4.0  jsonlite_1.7.2
[29] fs_1.5.1         hms_1.1.1        stringi_1.7.6    grid_4.1.2
[33] cli_3.1.0        tools_4.1.2      magrittr_2.0.1   crayon_1.4.2
[37] pkgconfig_2.0.3  ellipsis_0.3.2   xml2_1.3.3       pROC_1.18.0
[41] reprex_2.0.1     lubridate_1.8.0  assertthat_0.2.1 httr_1.4.2
[45] rstudioapi_0.13  R6_2.5.1         compiler_4.1.2
AlbertRapp
  • 408
  • 2
  • 9
  • 2
    Could you list a the packages that you are using (`sessionInfo()`). Perhaps a `grep` on their source code might find the culprit. Also, its useful to add a [traceback](https://rstats.wtf/debugging-r-code.html) – Donald Seinen Dec 06 '21 at 12:55
  • @AlbertRapp in the second last error it states that `x` is not defined in your environment and in last error you have used a function `slice` instead of value. – Isa Dec 06 '21 at 13:58
  • @Isa, yes this is the point. I was trying to find out what has to go wrong for the "ancestor" error to appear. The examples I was trying out yielded different errors. – AlbertRapp Dec 06 '21 at 14:28
  • @DonaldSeinen, this is a great idea. I was already thinking about doing something like this but I don't know how I can get to the source code of all packages (including internal functions) in order to do this. Is there something like a "grab source code"-function? – AlbertRapp Dec 06 '21 at 14:30
  • @AlbertRapp see [this post](https://stackoverflow.com/questions/19226816/how-can-i-view-the-source-code-for-a-function) for finding source code of functions, here is [between](https://github.com/tidyverse/dplyr/blob/main/src/funs.cpp) and [filter](https://github.com/tidyverse/dplyr/blob/main/src/filter.cpp), but neither contains the *ancestor* error. Have you managed to narrow it down? Again, please add the `sessionInfo()` to reduce the size of the haystack a bit. – Donald Seinen Dec 06 '21 at 15:35
  • I added the sessionInfo() in case it helps. Still no luck finding the source of the error message. – AlbertRapp Dec 06 '21 at 17:45
  • Take a look here: https://stackoverflow.com/questions/30248583/error-could-not-find-function if you use %>% inside a %dopar% loop, you have to add a reference to load package dplyr (or magrittr, which dplyr loads). – Technophobe01 Dec 20 '21 at 03:22
  • Tidyverse was already exported to the workers but to be sure I also added dplyr and magrittr to the list of packages in foreach. Didn't help. I also rewrote the filter function with base R syntax to avoid dplyr. Now I get a `bad generic call environment` error message. Don't know if that helps... – AlbertRapp Dec 21 '21 at 07:59
  • the %>% op makes debugging harder. I suggest to try to rewrite your code without it to see if it leads to more understandable messages. And does the error happen without the foreach ? Or with only one core ? – Karl Forner Dec 23 '21 at 08:43
  • The error happens only with dopar. And only sometimes. The exact same script sometimes completes without a error and sometimes it errors out after a few hours. I am not sure if there is a solution to this but I also don't understand how such a thing is possible – AlbertRapp Dec 23 '21 at 12:39
  • @AlbertRapp I've also sometimes received this error with one script I run. It was in dplyr::mutate IIRC. Did you ever try reporting it as an R bug or asking other forums? – Michael McFarlane Mar 10 '22 at 17:28
  • 1
    @MichaelMcFarlane I tried asking on [Github](https://github.com/tidyverse/dplyr/issues/6118) but no answer. Nevertheless, I just added an "answer" that helped me. Maybe this could help you too. – AlbertRapp Mar 10 '22 at 18:07

1 Answers1

1

I am really not sure if this is a definite answer and if the problem might occur at any time again but by now I believe that the problem comes from the parallel computation using foreach. This is probably not specific to the foreach package but rather a consequence of race conditions and I suspect that it is also a Heisenbug.

That being said, what helped the most was making sure that the foreach-loop does not terminate when there is some kind of error in one of the worker. More precisely I have set the .errorhandling-argument to "pass". So that the loop will in an error occurs, simply write the error message of that iteration into a list and collect the other results in the same list too.

In principle, the code looks like this

results <- foreach (
  i = itx, # itx is an iterator created via iter()
  .errorhandling = 'pass',
  .packages = #packages,
  .export = #exports
) %dopar% {
  # Code for parallel computation here
}

Interestingly, once I have added the errorhandling option, no more errors occurred and I could run the script multiple consecutive times without a hitch. Thus, my believe that we got a Heisenbug over here.

AlbertRapp
  • 408
  • 2
  • 9