I need to create huge data.frame of combinations, but I don't need them all. But as I saw here, expand.grid function is not able to add specific condition which combination throw out.
So I decided to go step by step. For example I have
variants<-9 # number of possible variants
aa<-c(0:variants) # vector of possible variants
ab<-c(0:variants)
ac<-c(0:variants)
ad<-c(0:variants)
ae<-c(0:variants)
af<-c(0:variants)
ag<-c(0:variants)
ah<-c(0:variants)
ai<-c(0:variants)
aj<-c(0:variants)
If I try to
expand.grid(aa,ab,ac,ad,ae,af,ag,ah,ai,aj)
the "cannot allocate vector of size" issue comes ..
So I tried to go step by step like
step<-2 # it is a condition for subsetting the grid
grid_2<-expand.grid(aa,ab)
sub_grid_2<-grid_2[abs(grid_2[,1]-grid_2[,2])<=step,]
which gives me combinations I need. To save memory I add then another column like
fun_grid_list_3<-function(x){
a<-sub_grid_2[x,1]
b<-sub_grid_2[x,2]
d<-rep(c(1:variants))
c<-data.frame(Var1=rep(a,variants),Var2=rep(b,variants),Var3=d)
return(c)
}
sublist_grid_3<-mclapply(c(1:nrow(sub_grid_2)),fun_grid_list_3,mc.cores=detectCores(),mc.preschedule=FALSE)
sub_grid_3=ldply(sublist_grid_3)
But the problem comes when I come to grid of 8 and more variables. It takes so much time, but it should be just adding a number into another frame. Maybe I am wrong and it trully need that time but I hope there is a more efficient way how to do this.
All I need is to create expand.grid of 2 variables, then add condition to subset it. Then add another column which respects the subsetted grid (add c(0:variants) to every row, it means create more rows of course ... and then subset it by condition and so ....
Can anybody help to make it faster? I hoped that use mclapply trought function should be the fastest, but maybe not ..
Thanks to anyone ...