I'm trying a super-simple foreach
.combine
task to read a folder of Rds
files and rbind
them into one:
all_dfs <- foreach(j = list.files(pattern = ".Rds"),
.errorhandling = "pass",
.combine = rbind,
.multicombine = TRUE) %dopar% {eachRdsFile <- readRDS(j)}
And get the error:
error calling combine function: simpleError in rbind(deparse.level, ...): invalid list argument: all variables should have the same length
However if I loop through all files and ask their length
, they're all the same (82):
for (j in list.files(pattern = ".Rds")) {
eachRdsFile <- readRDS(j)
print(length(eachRdsFile))}
The foreach
error occurs on file 153 of 206. It works on 1:152. I opened & investigated file 153 and it looks fine, same as 152. I tried a minimal reproducible example:
library(parallel)
library(doMC)
mycores <- 8
registerDoMC(cores = mycores)
testdfs <- foreach(j = 1:206,.errorhandling = "pass",.combine = rbind,.multicombine = TRUE) %dopar% {
eachdf <- data.frame(A = runif(10), B = runif(10))}
but it works fine. I ran the foreach
for 1:152 then loaded file 153 and rbound
them together and that works fine also. Foreach
on files 153:206 works fine (206 is the last file). 55:206 works fine (152 files). 54:206 fails (153 files). So it may be that the issue is with rbinding
>=153 files? My reprex attempt succeeded with 206 files so there's no problem with rbinding
>=153 of ANY object, clearly.
Can anyone think of a reason why this might be happening? I'm running out of ideas. This feels like a bug? Thanks in advance.
Edit: Thanks (again) to Florian Privé for help with the solve. The problem was related to my use of writeLines
and sink
(since I can't get print
(or progress bars
) to work in parallel no matter how hard I try):
writeLines(c(""), "log.txt")
all_dfs <- foreach(...){
sink("log.txt", append = TRUE)
When I outputted to list, it revealed that the sink stack full error was the problem:
I finally fixed that thanks to dmi3kno's answer, and went back to using the .combine
approach with no errors.