0

I'm making a "pairwise" array in R. Given the vector combo, I'm finding every permutation of 4 elements. Thus, a 4-dimensional "pairwise" array. My current approach is making it as a simple list, using nested sapply functions, like so:

fourList <- sapply(X = combo, FUN = function(h) {
    hi <- which(combo == h) #get index of h

    sapply(X = combo[hi:n], FUN = function(i) {
      ii <- which(combo == i) #get index of i

      sapply(X = combo[ii:n], FUN = function(j) {
        ji <- which(combo == j) #get index of j

        sapply(X = combo[ji:n], FUN = function(k) {
          list(c(h,i,j,k))
        })
      })
    })
  })

I'd like to make some sort of progress indicator, so I can report to the user what percentage of the array has been built. Ideally, I'd just take numberCasesCompleted and divide that by totalCases = length(combo)^4 to get the fraction that is done. However, I can't seem to figure out an algorithm that takes in hi, ji, and ii, and outputs the value numberCasesCompleted. How can I calculate this?

In the 2D (x by y) case (e.g: sapply(X, function(x) {sapply(X[xi:n], function(y) {list(c(x,y))}}), this could be calculated by sum(n - (x-2:x), y-(x-1)), but generalizing that to 4 dimensions sounds rather difficult.

twieg
  • 53
  • 2
  • 9
  • Any particular reason you're rolling your own permutation algorithm rather than using a optimized version? – Gregor Thomas Jul 13 '18 at 23:30
  • The easiest way would be to use `for` loops instead of `sapply`, and create a `numberCasesCompleted` counter that increments each time through the inner loop. – Gregor Thomas Jul 13 '18 at 23:33
  • @Gregor I'm not aware of an optimized version. Can you point me in the right direction? – twieg Jul 13 '18 at 23:34
  • I have tried using `for` loops (and indeed, incrementing a simple counter worked great for that), but `apply` functions are **so** much faster. I'm working with a dataset where `length(combo) = 127`, so it takes a **long time** (several hours) to run with `for` loops, while `apply` gets the job done in a few minutes. – twieg Jul 13 '18 at 23:36
  • 1
    [This answer is extremely thorough for all combinatorial needs](https://stackoverflow.com/a/47983855/903061). I'd be willing to bet it's recommendation, `rcppAlgos::permuteGeneral(combo, 4)` is at least 3 orders of magnitude faster than the `sapply`/`list` approach on a decently-sized input. – Gregor Thomas Jul 13 '18 at 23:38
  • Also `sapply` and `apply` aren't that different from `for`. If you see that much of a difference between `for` and `sapply` then you are doing something wrong in the `for` loop code. [See here for more info](https://stackoverflow.com/q/2275896/903061). I'd suggest getting a code review of your `for` loop code to figure out what you're doing to slow it down so badly. – Gregor Thomas Jul 13 '18 at 23:39
  • In fact, newer versions of R (>= 3.4.0) with JIT compilation can have `for` faster than `sapply`. Try out `microbenchmark::microbenchmark( sapply = sapply(1:n, I), for_loop = {result = integer(n); for(i in 1:n) result[i] = I(i)}, times = 10 )` to see. When I just ran it, `for` came out about 0.02 seconds faster than `sapply`. – Gregor Thomas Jul 13 '18 at 23:46
  • And yes, `RcppAlgos::permuteGeneral(v = 1:127, m = 4)` takes less than 3 seconds on my laptop. No progress bar needed. – Gregor Thomas Jul 13 '18 at 23:50

1 Answers1

0

I'm stupid. Just add the proportion complete of the first level to the proportion complete of the second level (scaled down to a single iteration at the first level), and so forth.

In my case: completion <- hi/(n+1) + (ii/(n+1))*(1/n) + (ji/n)*(1/n)*(1/n)

(The n+1 denominators are there because there's effectively another loop after hi is equal to n, as ii still has a full set of iterations to complete. Otherwise it would end at ~101%. But for a rough/quick estimation of progress, this is fine.)

However, it is worth noting that (according to @Gregor in the comments) there are much better ways of making combinations in R, so my original use case may be moot (just don't use nested sapply in the first place).

twieg
  • 53
  • 2
  • 9