I have a big data.table. Each parallel process reads from it, processes the data and returns a much smaller data.table. I don't want the big DT to be copied to all processes, but seems the %dopar%
function in foreach
package has to copy.
Is there a way to have the object shared across all processes (in windows)? That is, by using a package other than foreach
.
Example code
library(doParallel)
cluster = makeCluster(4)
registerDoParallel(cluster)
M = 1e4 # make this larger
dt = data.table(x = rep(LETTERS, M), y = rnorm(26*M))
res = foreach(trim = seq(0.6, 0.95, 0.05), .combine = rbind) %dopar% {
dt[, .(trimmean = mean(y, trim = trim)), by = x][, trim := trim]
}
(I'm not interested in a better way of doing this in data.table without using parallel. This is just to show the case that subprocesses need to read all the data to process, but never change it)