3

I'm trying a super-simple foreach .combine task to read a folder of Rds files and rbind them into one:

all_dfs <- foreach(j = list.files(pattern = ".Rds"),
 .errorhandling = "pass",
 .combine = rbind,
 .multicombine = TRUE) %dopar% {eachRdsFile <- readRDS(j)}

And get the error:

error calling combine function: simpleError in rbind(deparse.level, ...): invalid list argument: all variables should have the same length

However if I loop through all files and ask their length, they're all the same (82):

for (j in list.files(pattern = ".Rds")) {
 eachRdsFile <- readRDS(j)
 print(length(eachRdsFile))}

The foreach error occurs on file 153 of 206. It works on 1:152. I opened & investigated file 153 and it looks fine, same as 152. I tried a minimal reproducible example:

library(parallel)
library(doMC)
mycores <- 8
registerDoMC(cores = mycores)
testdfs <- foreach(j = 1:206,.errorhandling = "pass",.combine = rbind,.multicombine = TRUE) %dopar% {
  eachdf <- data.frame(A = runif(10), B = runif(10))}

but it works fine. I ran the foreach for 1:152 then loaded file 153 and rbound them together and that works fine also. Foreach on files 153:206 works fine (206 is the last file). 55:206 works fine (152 files). 54:206 fails (153 files). So it may be that the issue is with rbinding >=153 files? My reprex attempt succeeded with 206 files so there's no problem with rbinding >=153 of ANY object, clearly.

Can anyone think of a reason why this might be happening? I'm running out of ideas. This feels like a bug? Thanks in advance.

Edit: Thanks (again) to Florian Privé for help with the solve. The problem was related to my use of writeLines and sink (since I can't get print (or progress bars) to work in parallel no matter how hard I try):

writeLines(c(""), "log.txt")
all_dfs <- foreach(...){
sink("log.txt", append = TRUE)

When I outputted to list, it revealed that the sink stack full error was the problem:

screenshot

I finally fixed that thanks to dmi3kno's answer, and went back to using the .combine approach with no errors.

dez93_2000
  • 1,730
  • 2
  • 23
  • 34
  • Thanks Florian. This led to me finding the actual source of the error - if you'd like to post an answer for credit please do so. Cheers! – dez93_2000 Apr 20 '20 at 18:23

1 Answers1

3

You can remove the .combine to get a list and then use do.call("rbind", your_list).

It will probably be more efficient as well.

F. Privé
  • 11,423
  • 2
  • 27
  • 78