unique permutations of zero-one-vector in R

Question

My problem is as follows:

Imagine we have a vector (1,1,1,...,0,0) of length n with k ones in the beginning. Think of this vector as of vector with realizations of some variables L1 till Ln. What i need to calculate is

sum over all unique permutations of (1,1,1,...,0,0) of Function(L1,...,Ln)

I have searched for solutions of my problem and yes, there are some, which work as long as n isn't too big.

As long as n is under 30 my PC doesn't die and following idea works:

1) creating a data.frame of all unique permutations with a help of following code (found it here)

uniqueperm2 <- function(d) {
  dat <- factor(d)
  N <- length(dat)
  n <- tabulate(dat)
  ng <- length(n)
  if(ng==1) return(d)
  a <- N-c(0,cumsum(n))[-(ng+1)]
  foo <- lapply(1:ng, function(i) matrix(combn(a[i],n[i]),nrow=n[i]))
  out <- matrix(NA, nrow=N, ncol=prod(sapply(foo, ncol)))
  xxx <- c(0,cumsum(sapply(foo, nrow)))
  xxx <- cbind(xxx[-length(xxx)]+1, xxx[-1])
  miss <- matrix(1:N,ncol=1)
  for(i in seq_len(length(foo)-1)) {
    l1 <- foo[[i]]
    nn <- ncol(miss)
    miss <- matrix(rep(miss, ncol(l1)), nrow=nrow(miss))
    k <- (rep(0:(ncol(miss)-1), each=nrow(l1)))*nrow(miss) + 
               l1[,rep(1:ncol(l1), each=nn)]
    out[xxx[i,1]:xxx[i,2],] <- matrix(miss[k], ncol=ncol(miss))
    miss <- matrix(miss[-k], ncol=ncol(miss))
  }
  k <- length(foo)
  out[xxx[k,1]:xxx[k,2],] <- miss
  out <- out[rank(as.numeric(dat), ties="first"),]
  foo <- cbind(as.vector(out), as.vector(col(out)))
  out[foo] <- d
  t(out)
}

2) sum over components of this data.frame

Sadly in my problems n is 100 and above. Good news for me are that i actually do not need whole data.frame in my RAM. An algorithm which would remember last permutation, use it to evaluate Funktion(L1,...,Ln)and compute next permutation and so on in a loop would be enough. Any help is appreciated.

EDIT Hack-R asked for an example, here what i get

    > d <- c()
    > d[1:25]=0
    > d[25:50]=1
    > d
     [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
> uniqueperm2(d)
Error: cannot allocate vector of size 905608.1 Gb
In addition: Warning messages:
1: In vector("list", count) :
  Reached total allocation of 8109Mb: see help(memory.size)
2: In vector("list", count) :
  Reached total allocation of 8109Mb: see help(memory.size)
3: In vector("list", count) :
  Reached total allocation of 8109Mb: see help(memory.size)
4: In vector("list", count) :
  Reached total allocation of 8109Mb: see help(memory.size)

I wonder why this is such a strain on your computational resources? I used `proc.time` to time this function with n = 100 and it completed almost instantly on my laptop. The results were `user system elapsed 0.00 0.00 0.06` . I ran it with n = 1000 and the results were `user system elapsed 0.04 0.00 0.48`. You just run this function on a vector of 1's and 0's right? Like `d <- rep(1,0,100)`? Perhaps you can provide an example vector to use if I have misunderstood? — Hack-R, Sep 08 '16 at 01:41
@Hack-R Which vector did you use because there should be n!/(k!*(n-k)!) unique permutations so for k = 30, n = 100 you should get 10^25 vectors and that's a lot for .06 second :o. But I may have missed something doing the maths. — user1470500, Sep 08 '16 at 01:45
@Hack-R hey, try with a vector (40 times 1, 60 times 0). Script is fast, but such dataframe needs several gigabyte ram — holic, Sep 08 '16 at 01:45
@holic Thanks. I'm not sure I understand the vector you're describing. Can you please create the vector using R code within your question? **Edit** Oh wait do you mean like this? `d <- c(rep(1,40), rep(0,60))` — Hack-R, Sep 08 '16 at 01:46
@holic what is the Funktion you want to compute? I think for input such as n = 100, k = 50 you will need a work around and not compute the functions for each unique permutation. — user1470500, Sep 08 '16 at 01:48
@user1470500 i would gladly post the formula, but it seems laTeX code isn't accepted here. Formula is pretty sick though, those Variables L1, ... Ln are inside of an integral. I dont worry if script needs an hour or two to compute as long as it computes :) — holic, Sep 08 '16 at 01:56
n = 100, k = 50, you get 10^29 unique permutation meaning it is more a matter of billion years than hours of computation. But if k is small it is an all different matter. — user1470500, Sep 08 '16 at 02:06
@user1470500 if you right, then it is not a big problem. This formula was a result of theorical modelling in my masterthesis. Billion years is ofc unacceptable, so i can say that this formula shoudnt be used for big n. — holic, Sep 08 '16 at 02:16
@holic: the value of `k` is absolutely critical. If you have `n=100`, `k=30` and you want to finish your computation in a year you will need to evaluate the function `choose(100,30)/(365*24*60*60)` = about 10^18 times *per second*. I think enumerating the permutations is the least of your problems ... can you say for what values of `n` and `k` you are *actually* needing to do this computation ... ? — Ben Bolker, Sep 08 '16 at 02:18
@BenBolker hey Ben, i think as long as i can provide a working script in my masterthesis and explain the problematic of computation for big k it will be enough. Not every theoretical model can be implemeted well. n stands for amount of positions in portfolio and goes up to 1000 and k reresents the amount of defaulted positions and goes therefor every value from 0 to n. — holic, Sep 08 '16 at 02:26

score 1 · Accepted Answer · answered Sep 08 '16 at 02:00

Here's one way to walk the permutations. I still think there is a better way but haven't figured it out yet.

This function looks at an array of 1's an 0's and tries to move the right most 1 to the left if possible. (Basically thinking of the vector as a binary number and trying to find the next largest number with exactly n bits)

next_x <- function(x) {
    i <- tail(which(diff(x)==1),1)
    if (length(i)>0) {
        x[c(i, i+1)]<-c(1,0)
        x[(i+1):length(x)] <- sort(x[(i+1):length(x)])
    } else {
        stop("no more moves")
    }
    x
}

You start out with x all to the right and you can iterate with

x <- c(0,0,0,0,1,1,1)
while(!all(x==c(1,1,1,0,0,0,0))) {
    x <- next_x(x)
    print(x)
}

Big Thank you MrFlick, i made some runs, looks very good. I will try to modify your script a bit to run with n and k given. Will accept your answer if i am succesful. — holic, Sep 08 '16 at 02:11

Randy Lai · Answer 2 · 2016-09-11T18:56:21.760

iterpc is another solution

k <- 5
n <- 10
library(iterpc)
it <- iterpc(c(k, n-k), ordered=TRUE)
while (!is.null(x <- getnext(it))){
  print(x)
}

PS: The default labels are 1 and 2 instead of 0 and 1.

Simple benchmark shows that iterpc is at least 2x faster than next_x when n=10, k=5

Unit: milliseconds
   expr       min        lq      mean    median        uq       max neval
 next_x 11.663353 12.599623 13.665913 13.532414 14.411556 17.619208   100
 iterpc  4.987268  5.325663  5.939558  5.613265  6.572008  8.685916   100

unique permutations of zero-one-vector in R

2 Answers2