In short, there is no implemented method in parallelSVM
to handle this issue. However the package uses the foreach
and doParallel
packages to handle it's parallel operations. And digging hard enough on stackoverflow a solution is possible!
Credits to this answer, on the usage of the doRNG
package, and this answer for giving me an idea for a simpler enclosed solution.
Solution:
In the parallelSVM
package the parallelization happens through the parallelSVM::registerCores
functions. This function simply calls doParallel::registerDoParallel
with the number of cores, and no further arguments. My idea is simply to change the parallelSVM::registerCores
function, such that it automatically sets the seed at after creating a new cluster.
When performing parallel computation, in which you need a parallel seed, there are 2 things you need to ensure
- The seed needs to be given to each node in the cluster
- The generator needs to be one that is asymptotically random across clusters.
Luckily the doRNG
package handles the first and uses a seed that which is alright on 2. Using a combination of unlockNamespace
and assign
we can overwrite the parallelSVM::registerCores
, such that it includes a call to doRNG::registerDoRNG
with the appropriate seed (function at the end of answer). Doing this we can actually get proper reproducibility as illstrated below:
library(parallelSVM)
library(e1071)
data(magicData)
set.seed.parallelSWM(1) #<=== set seed as we would normally.
#Example from help(parallelSVM)
system.time(parallelSvm1 <- parallelSVM(V11 ~ ., data = trainData[,-1],
numberCores = 4, samplingSize = 0.2,
probability = TRUE, gamma=0.1, cost = 10))
system.time(parallelSvm2 <- parallelSVM(V11 ~ ., data = trainData[,-1],
numberCores = 4, samplingSize = 0.2,
probability = TRUE, gamma=0.1, cost = 10))
pred1 <- predict(parallelSvm1)
pred2 <- predict(parallelSvm2)
all.equal(pred1, pred2)
[1] TRUE
identical(parallelSvm1, parallelSvm2)
[1] FALSE
Note that identical
does not have the power to properly asses the objects output by parallel::parallelSvm
, and thus the predictions are better to check whether the models are identical.
For safety lets check if this is also the case for the reproducible example in the question
x <- subset(iris, select = -Species)
y <- iris$Species
set.seed.parallelSWM(1) #<=== set seed as we would normally (not necessary if above example has been run).
model <- parallelSVM(x, y)
model2 <- parallelSVM(x, y)
parallelPredicitions <- predict(model, x)
parallelPredicitions2 <- predict(model2, x)
all.equal(parallelPredicitions, parallelPredicitions2)
[1] TRUE
Phew..
Last, if we are done, or if we wanted random seeds once again, we can reset the seed by executing
set.seed.parallelSWM() #<=== set seed to random each execution (standard).
#check:
model <- parallelSVM(x, y)
model2 <- parallelSVM(x, y)
parallelPredicitions <- predict(model, x)
parallelPredicitions2 <- predict(model2, x)
all.equal(parallelPredicitions, parallelPredicitions2)
[1] "3 string mismatches"
(the output will vary, as the RNNG seed is not set)
set.seed.parallelSWM function
credits to this answer. Note that we might not have to double up on the assignment, but here i simply replicated the answer without checking if the code could be further reduced.
set.seed.parallelSWM <- function(seed, once = TRUE){
if(missing(seed) || is.character(seed)){
out <- function (numberCores)
{
cluster <- parallel::makeCluster(numberCores)
doParallel::registerDoParallel(cluster)
}
}else{
require("doRNG", quietly = TRUE, character.only = TRUE)
out <- function(numberCores){
cluster <- parallel::makeCluster(numberCores)
doParallel::registerDoParallel(cluster)
doRNG::registerDoRNG(seed = seed, once = once)
}
}
unlockBinding("registerCores", as.environment("package:parallelSVM"))
assign("registerCores", out, "package:parallelSVM")
lockBinding("registerCores", as.environment("package:parallelSVM"))
unlockBinding("registerCores", getNamespace("parallelSVM"))
assign("registerCores", out, getNamespace("parallelSVM"))
lockBinding("registerCores", getNamespace("parallelSVM"))
#unlockBinding("registerCores", as.environment("package:parallelSVM"))
invisible()
}