So I'm have a large list of combinations from a data that I'm running a simple lm regression on, however the combination list is very long and it takes a long time to run all the lm for each list. I googling and came up upon the package parallel and beginning to understand mclapply
, but then realizing that it doesn't work for windows. Then i came upon future.apply::future_lapply
So basically this is part of my function that is the slowest, which is:
regression<- combinations %>%
apply(1, paste, collapse = " + ") %>%
gsub(pattern = " \\+ NA", replacement = "", x = .) %>%
paste(Y-variable, "~", .)
fmla_nocons <- paste(fmla, "- 1")
# run lm models
model <- lapply(fmla_nocons, function(x) lm(x, data = df))
My combinations is a basically a list that looks like :
var 1 var 2 var 3
variable 1 variable 2 variable 3
variable 2 variable 3 variable 4
... ... ...
This is a very long list so the first step is making it all y~ variable 1+variable2+ variable 3
and the second step is using lapply to run lm
regression on all the different combinations.
However I researching using future_lapply
will run it on a multicore system (Correct me if i misunderstood), will there also be clusters that similar with mclappy or is it as simple as replacing lapply (data, function(x) lm(x, data=df))
to future_lapply(data, function(x)lm(x, data=df)))
?
Any feedback or input will be helpful and thanks for your time!