Following example is based on discussion about using expand.grid
with large data. As you can see it ends up with error. I guess this is due to possible combinations which is according to mentioned page 68.7 billions:
> v1 <- c(1:8)
> v2 <- c(1:8)
> v3 <- c(1:8)
> v4 <- c(1:8)
> v5 <- c(1:8)
> v6 <- c(1:8)
> v7 <- c(1:8)
> v8 <- c(1:8)
> v9 <- c(1:8)
> v10 <- c(1:8)
> v11 <- c(1:8)
> v12 <- c(1:8)
> expand.grid(v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12)
Error in rep.int(rep.int(seq_len(nx), rep.int(rep.fac, nx)), orep) :
invalid 'times' value
In addition: Warning message:
In rep.int(rep.int(seq_len(nx), rep.int(rep.fac, nx)), orep) :
NAs introduced by coercion to integer range
Even with eight vectors it kills my CPU and/or RAM (> expand.grid(v1, v2, v3, v4, v5, v6, v7, v8)
). Here I've found some improvements which suggests using outer
or rep.int
. Those solutions works with two vectors so I've not able to apply it for 12 vectors but I guess the principle is the same: It creates large matrix which resides in memory. I'm wondering if there is something like python's xrange which evaluates lazily? Here I've found delayedAssign
function but I guess this will not help because there is also mentioned following:
Unfortunately, R evaluates lazy variables when they are pointed to by a data structure, even if their value is not needed at the time. This means that infinite data structures, one common application of laziness in Haskell, are not possible in R.
Is using nested loops only solution for this problem?
PS: I have not specific problem, but suppose you need to do some computation using function which is accepting 12 integer arguments, for some reason. Also suppose that you need to make all combinations of those 12 integers and save results to file. Using 12 nested loops and saving results to file continuously will work (despite it will be slow but it will not kill your RAM). Here is shown how you can use expand.grid
and apply
function to replace two nested loops. Problem is that creating such matrix with 12 vectors of length 8 using expand.grid
has some disadvantages:
- generating such matrix is slow
- such large matrix consumes a lot of memory (68.7 billion rows and 8 columns)
- further iteration over this matrix using
apply
is slow also
So in my point of view functional approach is much more slower than procedural solution. I'm just wondering if is possible to lazily create large data structure which in theory does not fit in to memory and iterate over it. That's all.