1

I need to create huge data.frame of combinations, but I don't need them all. But as I saw here, expand.grid function is not able to add specific condition which combination throw out.

So I decided to go step by step. For example I have

variants<-9 # number of possible variants
aa<-c(0:variants) # vector of possible variants
ab<-c(0:variants)
ac<-c(0:variants)
ad<-c(0:variants)
ae<-c(0:variants)
af<-c(0:variants)
ag<-c(0:variants)
ah<-c(0:variants)
ai<-c(0:variants)
aj<-c(0:variants)

If I try to

expand.grid(aa,ab,ac,ad,ae,af,ag,ah,ai,aj)

the "cannot allocate vector of size" issue comes ..

So I tried to go step by step like

step<-2 # it is a condition for subsetting the grid
grid_2<-expand.grid(aa,ab)
sub_grid_2<-grid_2[abs(grid_2[,1]-grid_2[,2])<=step,]

which gives me combinations I need. To save memory I add then another column like

  fun_grid_list_3<-function(x){
  a<-sub_grid_2[x,1]
  b<-sub_grid_2[x,2]
  d<-rep(c(1:variants))
  c<-data.frame(Var1=rep(a,variants),Var2=rep(b,variants),Var3=d)
  return(c)
}

    sublist_grid_3<-mclapply(c(1:nrow(sub_grid_2)),fun_grid_list_3,mc.cores=detectCores(),mc.preschedule=FALSE)
sub_grid_3=ldply(sublist_grid_3)

But the problem comes when I come to grid of 8 and more variables. It takes so much time, but it should be just adding a number into another frame. Maybe I am wrong and it trully need that time but I hope there is a more efficient way how to do this.

All I need is to create expand.grid of 2 variables, then add condition to subset it. Then add another column which respects the subsetted grid (add c(0:variants) to every row, it means create more rows of course ... and then subset it by condition and so ....

Can anybody help to make it faster? I hoped that use mclapply trought function should be the fastest, but maybe not ..

Thanks to anyone ...

Bury
  • 527
  • 2
  • 5
  • 15
  • 1
    What's your goal? Which are the combinations you want to keep? – nicola Apr 03 '16 at 20:13
  • I believe Rcpp would make it trivial to create combinations and keep only those of interest on-the-go (a slight issue might be preallocating a matrix of good dimensions if the number of useful combinations is hard to guess.) – baptiste Apr 03 '16 at 20:19
  • The goal is. I am working on school project where I want to create combinations of possible combinations of responsible variables to see then which are better (it is about stocks because of data) and I know that some combinations are impossible in real situations, so I dont need them ( I would subset them, but I try to save memory). So the problem is not in this area I think. I want to find faster way of fun_grid_list_3 to use it for fun_grid_list_8 for example, where the computing time is huge .. – Bury Apr 03 '16 at 20:30
  • Does [this](http://stackoverflow.com/questions/36143323/pythons-xrange-alternative-for-r-or-how-to-loop-over-large-dataset-lazilly/36144255#36144255) implementation of a lazy `expand.grid` help? – r2evans Apr 04 '16 at 04:59
  • @r2evans I went trough it but I am not sure if I understood it completely. I dont have specific numbers which I want to throw out, it depends, like in example above, on absolute difference of specific columns in grid – Bury Apr 04 '16 at 07:35
  • `lazyExpandGrid(aa,ab,ac,ad,ae,af,ag,ah,ai,aj)` shouldn't give you an allocation error, isn't that the problem you are trying to solve? – r2evans Apr 04 '16 at 14:00
  • @r2evans Yeah that will probably solve my problem, but if I copy the function and then try to run it like lazyExpandGrid(aa,ab,ac,ad,ae,af,ag,ah,ai,aj) it only shows me the core of the function, not the result. Any advice how to solve it? Some save or something? I was used to just to write the function and then it works. Or any other packages need to be install? Thank you a lot – Bury Apr 06 '16 at 07:02
  • Did you read the documentation with it, including step-by-step example usage? – r2evans Apr 06 '16 at 07:04
  • Yes, but it looks I am not skilled now to write my own iterator. I have installed iterators package and it still doesn't work. If the documentation you mean is this https://cran.r-project.org/web/packages/iterators/vignettes/writing.pdf – Bury Apr 06 '16 at 07:34
  • Actually, I thought the answer itself would have been sufficient for a basic (non-`iterators`) usage of the function. I'm sorry I didn't recommend you read through all, including the comment where I referenced [the gist of a more complete implementation](https://gist.github.com/r2evans/e5531cbab8cf421d14ed) (that includes more documentation). – r2evans Apr 06 '16 at 09:35
  • @r2evans Thank you I maybe see the problem. R cannot find lengths function even if I updated all packages and it should be in base package. If I try to update base package, it shows that it is not available for 3.1.2. version ...it seems it don't want me to use lazyexpandgrid function :) – Bury Apr 06 '16 at 10:17
  • You can replace `lengths(x)` with `sapply(x, length)`. – r2evans Apr 06 '16 at 14:06

0 Answers0