24

I'm looking for as much speed as possible and staying in base to do what expand.grid does. I have used outer for similar purposes in the past to create a vector; something like this:

v <- outer(letters, LETTERS, paste0)
unlist(v[lower.tri(v)])

Benchmarking has shown me that outer can be drastically faster than expand.grid but this time I want to create two columns just like expand.grid (all possible combos for 2 vectors) but my methods with outer do not benchmark as fast with outer this time.

I'm hoping to take 2 vectors and create every possible combo as two columns as fast as possible (I think outer may be the route but am wide open to any base method.

Here's the expand.grid method and outer method.

dat <- cbind(mtcars, mtcars, mtcars)

expand.grid(seq_len(nrow(dat)), seq_len(ncol(dat)))

FOO <- function(x, y) paste(x, y, sep=":")
x <- outer(seq_len(nrow(dat)), seq_len(ncol(dat)), FOO)
apply(do.call("rbind", strsplit(x, ":")), 2, as.integer)

The microbenchmarking shows outer is slower:

#     expr      min        lq    median        uq      max
# EXPAND.G  812.743  838.6375  894.6245  927.7505 27029.54
#    OUTER 5107.871 5198.3835 5329.4860 5605.2215 27559.08

I think my outer use is slow because I don't know how to use outer to directly create a length 2 vector that I can do.call('rbind' together. I have to slow paste and slow split. How can I do this with outer (or other methods in base) in a way that's faster than expand grid?

EDIT: Adding the microbenchmark results.

**

Unit: microseconds
      expr     min       lq  median      uq       max
1   ERNEST  34.993  39.1920  52.255  57.854 29170.705
2     JOHN  13.997  16.3300  19.130  23.329   266.872
3 ORIGINAL 352.720 372.7815 392.377 418.738 36519.952
4    TOMMY  16.330  19.5960  23.795  27.061  6217.374
5  VINCENT 377.447 400.3090 418.505 451.864 43567.334

**

enter image description here

Henrik
  • 65,555
  • 14
  • 143
  • 159
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519

4 Answers4

18

The documentation for rep.int isn't quite complete. It isn't just fastest in the most common case because you can pass vectors for the times argument, just like with rep. You can use it straightforward for both sequences reducing the time another 40% or so over Tommy's.

expand.grid.jc <- function(seq1,seq2) {
    cbind(Var1 = rep.int(seq1, length(seq2)), 
    Var2 = rep.int(seq2, rep.int(length(seq1),length(seq2))))
}
Henrik
  • 65,555
  • 14
  • 143
  • 159
John
  • 23,360
  • 7
  • 57
  • 83
  • in it's current state your code throws up an error. I think it's missing a parenthesis somewhere but I couldn't figure out where. – Tyler Rinker May 02 '12 at 12:53
  • +1 Awesome! It *shouldn't* be faster than mine, but it certainly is :-) Apparently, `rep` needs to improve how it handles the `each` argument... – Tommy May 02 '12 at 16:16
  • yes Tommy, it should... I actually think that expand.grid uses something like what I wrote internally... it's just slow because of error checking and robustness. – John May 02 '12 at 17:28
  • What if we just have one list to pass as variable and want all possible combinations ? e.g. expand.grid_jc(input_list) – Rushabh Patel Mar 12 '19 at 14:50
  • I'm not sure what you mean. If you have a list with two vectors in it then you could just use do.call(expand.grid.jc, myList). Otherwise, I'm not sure how the question makes sense. Perhaps expand it to a full CrossValidated question? – John Mar 15 '19 at 14:25
17

Using rep.int:

expand.grid.alt <- function(seq1,seq2) {
  cbind(rep.int(seq1, length(seq2)),
        c(t(matrix(rep.int(seq2, length(seq1)), nrow=length(seq2)))))
}

expand.grid.alt(seq_len(nrow(dat)), seq_len(ncol(dat)))

In my computer is like 6 times faster than expand.grid.

Jay
  • 18,959
  • 11
  • 53
  • 72
Ernest A
  • 7,526
  • 8
  • 34
  • 40
  • I was very skeptical but it is very fast. Nice response. I guess outer wasn't the approach i should have been taking. I posted the microbenchmarking results on a WIn 7 machine above. – Tyler Rinker May 02 '12 at 00:22
  • 1
    @TylerRinker Beware, there was a bug in my function! The `nrow` argument was wrong. I've fixed it now. – Ernest A May 02 '12 at 01:07
  • @ErnestA I added a parenthesis to the far right of the last line of code then had to add some more text before the edit could be submitted. – Mark Miller Jul 22 '13 at 20:22
6

@ErnestA has a great solution well worthy of the answer tick!

...it could be marginally faster though:

expand.grid.alt2 <- function(seq1,seq2) {
  cbind(Var1=rep.int(seq1, length(seq2)), Var2=rep(seq2, each=length(seq1)))
}

s1=seq_len(2000); s2=seq_len(2000)
system.time( for(i in 1:10) expand.grid.alt2(s1, s2) ) # 1.58
system.time( for(i in 1:10) expand.grid.alt(s1, s2) )  # 1.75
system.time( for(i in 1:10) expand.grid(s1, s2) )      # 2.46
Tommy
  • 39,997
  • 12
  • 90
  • 85
  • Very nice, definitely a speed boost over Ernest's. +1 I am going to keep the check on Ernest though this is faster because I've already given the check and I feel funny about reassigning. It would make me feel at complete ease if you based your response on his answer :) May I ask if you did? – Tyler Rinker May 02 '12 at 02:13
  • 1
    @TylerRinker - Yes I did. So you can feel at complete ease now ;-) – Tommy May 02 '12 at 02:43
  • @Tommy I went to great lengths to avoid `rep(... each=)` because I assumed it'd be slower. In fact, it isn't. – Ernest A May 02 '12 at 08:49
  • 3
    @TylerRinker Feel free to uncheck my answer as there are other solutions that run faster. No problem at all! – Ernest A May 02 '12 at 08:52
3

You can create the two columns separately.

library(microbenchmark)
n <- nrow(dat)
m <- ncol(dat)
f1 <- function()   expand.grid(1:n, 1:m)
f2 <- function()   
  data.frame( 
    Var1 = as.vector(outer( 1:n, rep(1,m) )),
    Var2 = as.vector(outer( rep(1,n), 1:m ))
  )
microbenchmark( f1, f2, times=1e6 )
# Unit: nanoseconds
#   expr min  lq median  uq    max
# 1   f1  70 489    490 559 168458
# 2   f2  70 489    490 559 168597
Vincent Zoonekynd
  • 31,893
  • 5
  • 69
  • 78
  • Thanks for the response. You solved my outer problem and the learning was terrific. Ernest's approach is very quick, much quicker than the outer approach. – Tyler Rinker May 02 '12 at 00:21