1

I am trying to create the unique permutations of the vector

 c(rep(0,20),rep(1,20))

but permn in the combinat package says "Error in vector.... vector size specified is too large".

I also tried the script uniqueperm2 from this question and got a "vector specified is too large" error again.

In my understanding permn would be creating 40! permutations.

I might be able to solve the overall problem I am working on just by finding the number of unique permutations of this vector.

Is the problem of finding the unique permutations of a length 40 vector too big for R and can anyone explain to me how to figure out the number of unique permutations of the listed vector?

Community
  • 1
  • 1
Boyd Johnson
  • 342
  • 7
  • 11

3 Answers3

2

Each unique permutation of this vector corresponds to a subset of size 20 from {1, 2, ..., 40} (i.e, the indices of the 1s).

The number of these subsets is "40 choose 20", or 40! / (20! x 20!). According to Google calculator this equals 137,846,528,820

qdjm
  • 1,033
  • 10
  • 14
1

As you have already been told, the number of permutations can be computed as:

R> choose(40,20)
[1] 137846528820

However, the longest vector that R can currently handle is 2^31 - 1 elements long, which is:

R> 2^31 - 1
[1] 2147483647

which is way smaller than the number of permutations you want to generate. Hence the errors you are getting and the error you;d get if you tried this with the standard function to generate these permutations combn():

R> combn(40, 20)
Error in matrix(r, nrow = len.r, ncol = count) : 
  invalid 'ncol' value (too large or NA)
In addition: Warning message:
In combn(40, 20) : NAs introduced by coercion

At this point you'll have to resort to writing code to generate the permutations in a bath-like manner and investigate one of the many big-data packages on R (see the High Performance Computing task view).

Or, and this would be my suggestion, consider what you are possibly going to be able to do with 137 billion! (American) permutations and then take another approach. If you could deal with 1 per second (i.e. do something meaningful with the permutation that took 1 second) you'd still be working on the results in 4000 years time!

So why do you want all the permutations? Would a smaller random set suffice?

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • It was for a problem dealing with the number of routes from the top left corner of a 20x20 square area to the bottom right corner with no backtracking making 1unit moves at a time. I wanted to generate the routes {1,22,...,441} for example. But since the number is so large I'll settle for just know the number of routes. – Boyd Johnson Nov 03 '12 at 00:50
0

There are (40 choose 20) unique permutations (i.e. choose 20 of the 40 positions for the 0s and use the other 20 for the 1s.) That's still quite a large number, but you could try combn(40, 20)

rici
  • 234,347
  • 28
  • 237
  • 341
  • Not in any R I am familiar with, except perhaps bits of R Devel currently. The longest vector in R currently is 2^31 - 1 elements long. `combn(40,20)` would yield `choose(40,20)` combinations (actually it would yield a vector of total length `2 * choose(40,20)` as it stores the result as a matrix. This is far, far too big to be handled by R's tools currently. – Gavin Simpson Nov 02 '12 at 19:53
  • @GavinSimpson: indeed. I should have computed the actual value. Still, it's only huge. By the way, I'd be surprised if it takes R a whole second to do anything meaningful with 20 numbers, even given what I've heard about R's lack of speed. If you could do something meaningful with each permutation in a microsecond -- like toss it out of consideration -- then you could get through the lot in a couple of days, and that's without any parallelization at all. (Oh, and it's 0.138 yanqui trillion. fwiw) – rici Nov 02 '12 at 22:40
  • @rico thanks for pointing out that I can't count. Fixed now. As for other part; yes, there are ways round this and parallel processing will help, as will the on disk storage options. I was being conservative with the time per permutation assuming a non trivial computation. R is generally quite quick; sometimes user friendliness gets in way though. – Gavin Simpson Nov 03 '12 at 10:23