1

I've found this function in another post that sequentially outputs combinations of vectors when called. It is essentially a work around to expand.grid when there are large numbers of vectors with many elements.

Here is the function:

lazyExpandGrid <- function(...) {
  dots <- list(...)
  argnames <- names(dots)
  if (is.null(argnames)) argnames <- paste0('Var', seq_along(dots))
  sizes <- lengths(dots)
  indices <- cumprod(c(1L, sizes))
  maxcount <- indices[ length(indices) ]
  i <- 0
  function(index) {
    i <<- if (missing(index)) (i + 1L) else index
    if (length(i) > 1L) return(do.call(rbind.data.frame, lapply(i, sys.function(0))))
    if (i > maxcount || i < 1L) return(FALSE)
    setNames(Map(`[[`, dots, (i - 1L) %% indices[-1L] %/% indices[-length(indices)] + 1L  ),
             argnames)
  }
} 

Here are some example calls:

set.seed(42)
nxt <- lazyExpandGrid(a=1:1e2, b=1:1e2, c=1:1e2, d=1:1e2, e=1:1e2, f=1:1e2)
as.data.frame(nxt()) # prints the 1st possible combination
nxt(sample(1e2^6, size=7)) # prints 7 sampled rows from the sample space

What I cannot figure out is how to conditionally sample using lazyExpandGrid2. I would like to exclude samples if they have certain numbers of elements.

For example say i have these vectors for which I want to create unique combinations of: a=0:3, b=0:4, c=0:5. I could create samples using: nxt(sample(50, size=50, replace = F)).

But lets say I am not interested in samples where there are two 0s. How could I exclude these samples? I've tried things like: nxt(sample(which(!(sum(as.data.frame(nxt()) == 0)==2)), size=50, replace = F)).

I just don't understand how to reference the sampled row in sample() to be able to exclude it if it doesn't meet a certain criteria.

RTrain3k
  • 845
  • 1
  • 13
  • 27
  • You would have to pre-compute the valid indices not matching your exclusion criterion. Alternatively, you could drop samples that don't fit after the call to `nxt`. – RolandASc Feb 07 '18 at 16:23
  • So the only way to do it would be pre-compute or something like this: nxt <- lazyExpandGrid(a=0:3, b=0:4, c=0:5) x <- as.data.frame(nxt()) for (i in 1:119){ y <- as.data.frame(nxt()) if(!(sum(y == 0)==2)){x <- rbind(x,y)} } x – RTrain3k Feb 07 '18 at 17:00
  • The second method seems inefficient because I do not necessarily need all of the possible combinations meeting the criteria, and with many vectors with many elements the possible combinations would get enormous. Is there a way to combine sample with the above code? It would need to sample without replacement. – RTrain3k Feb 07 '18 at 17:06

1 Answers1

1

If you want to drop rows that don't meet a condition, I don't think you need to worry about sampling without replacement as passing the same value tonxt should generate an identical row, which would still be dropped. It might work, then, to make a wrapper for the function as you've defined it above that just doesn't include a nxt-generated row if it doesn't meet the condition you're after. Here, the row is dropped if the number of zeroes is equal to 2:

set.seed(0123)

nxt <- lazyExpandGrid(a = 0:3, b = 0:4, c = 0:5)

nxtDrop <- function(samp, n_row){
  t(sapply(1:n_row, function(x) {
    y = nxt(sample(samp, 1))
    while (length(grep(0, y)) == 2) {
      y = nxt(sample(samp, 1))
    }
    return(y)
  }))
}

> nxtDrop(120, 10)
      a b c
 [1,] 2 3 1
 [2,] 2 3 4
 [3,] 1 2 2
 [4,] 1 1 5
 [5,] 0 3 5
 [6,] 1 1 0
 [7,] 3 0 3
 [8,] 3 1 5
 [9,] 2 1 3
[10,] 2 3 2
Luke C
  • 10,081
  • 1
  • 14
  • 21