1

I am trying to use expand.grid() on 16 of my predictor variables, some of which have factors with 12 levels, 40 levels, 48 levels, the largest of which has 462 levels. Whenever I use expand.grid in R, I get the "vector is too large" error. memory.limit() shows that there is 8 GB available, and I'm running 64-bit R. Any suggestions on how to do this? If the result can't be computed in memory, then, perhaps, we can just write the output to disk?

Thanks. Thomas.

Thomas Moore
  • 941
  • 2
  • 11
  • 17
  • What are you trying to achieve? Why do you want to compute this massive table? – asachet Jun 29 '17 at 16:38
  • 1
    Perhaps using `spark` and making the grid on the external machine would solve your problem? – MikolajM Jun 29 '17 at 16:40
  • I want to input all possible combinations of predictor variables for a neural network model i fitted. – Thomas Moore Jun 29 '17 at 16:43
  • Have you checked solutions from here? (https://stackoverflow.com/questions/1395229/increasing-or-decreasing-the-memory-available-to-r-processes) – MikolajM Jun 29 '17 at 16:46
  • Yes, by my estimation, the vector that R is trying to create is roughly 4 GB in size. But, R says I have 8 GB of memory available, in memory.limit(), so I don't know what's going on! – Thomas Moore Jun 29 '17 at 16:49
  • See, [here](https://stackoverflow.com/questions/36143323/pythons-xrange-alternative-for-r-or-how-to-loop-over-large-dataset-lazilly), a similar post – alexis_laz Jun 29 '17 at 16:52
  • Hi @alexis_laz I will check that out, thanks. – Thomas Moore Jun 29 '17 at 16:55
  • @ThomasMoore I have 16GB RAM and SSD drive. You can send me your code and I can try to run it in my machine. If it succeeds than I will write it as csv and send you the results – MikolajM Jun 29 '17 at 17:10
  • 1
    @ThomasMoore - If the final result is going to take 4GB then it will very likely need more space than that for the computations. You could always do part of it 'by hand'. Split your variables in half and use expand.grid on each half. Then cycle through one of the results and attach the results from the other half - doing whatever it is you want to do with those results and storing the output, then move onto the next row in your first half, repeat, etc... – Dason Jun 29 '17 at 17:34
  • Have you calculated how many rows this is? Assuming your smallest predictor is 12, then 12*12*12*12*12*12*12*12*12*12*12*12*12*40*48*462 = 9.5e+19. The answer will probably depend on a better estimate of the size of problem you're attempting. – Eric Watt Jun 29 '17 at 20:18

0 Answers0