I need to estimate a transition Matrix. Since I have lots of data I tried to run it parallel using foreach
and I tried the shared-memory function FBM()
of bigstatsr
. And it seems that the function does not all the time return the right result. (Sometimes it does.) Could it be the case that the function doesn’t work properly?
Here is an example when the code works correctly:
x <- c(1,2,1,1,3,4,4,1,2,4,1,4,3,4,4,4,3,1,3,2,3,3,3,4,2,2,3)
n <- length(unique(x))
A <- matrix(nrow = n, ncol = n, 0)
for (t in 1:(length(x) - 1)) {A[x[t], x[t + 1]] <- A[x[t], x[t + 1]] + 1}
A
And here is the Code that does not always return the correct result:
library(foreach)
library(doParallel)
library(bigstatsr)
cl <- makeCluster(8)
registerDoParallel(cl)
B <- FBM(n, n)
set.seed(3)
foreach (t = 1:(length(x) - 1)) %dopar% {B[x[t], x[t + 1]] <- B[x[t], x[t + 1]] + 1}
stopCluster(cl)
B[]
identical(A,B[])
The same happens when using snow
library
library(snow)
library(bigstatsr)
cl <- makeCluster(8)
f.trans.m <- function(t) {
D[x[t], x[t + 1]] <<- D[x[t], x[t + 1]] + 1
}
D <- FBM(n, n)
clusterExport(cl, "f.trans.m")
clusterExport(cl, "D")
clusterExport(cl, "x")
parLapply(cl, seq(1,(length(x) - 1)), function(t) f.trans.m(t))
D[]
identical(A,D[])
Do I use the package correctly, or is there a bug in FBM()
?
a solution:
A file lock was missing which is provided by the package flock
.
B <- FBM(n, n)
lock <- tempfile()
foreach (t = 1:(length(x) - 1)) %dopar% {
locked <- flock::lock(lock)
B[x[t], x[t + 1]] <- B[x[t], x[t + 1]] + 1
flock::unlock(locked)
}