I have a list with around 3900 elements that I need to randomly permute to produce a statistical distribution. I looked around and found this Maximal Length of List to Shuffle with Python random.shuffle that explains that the period of the PRNG in Python is 2**19937-1
, which leads to a list with a maximum length of 2080
before it becomes impossible to generate all possible permutations. I am only producing 300-1000 permutations of the list so it unlikely that I will be producing duplicate permutations, however, since this is producing a statistical distribution I would like to have all possible permutations as potential samples.

- 519
- 2
- 11
-
You want to produce all possible permutations of 3900 elements? – Saeid Dec 07 '15 at 17:13
-
Have you considered using `os.urandom`? urandom is suitable for crypto, so I figure it should work for whatever you're doing. – senshin Dec 07 '15 at 17:19
-
5For statistical purposes - in fact, for essentially any purposes - the limitation on possible permutations is no issue. If you're worried about this, you have much bigger deviations from an ideal distribution to worry about. – user2357112 Dec 07 '15 at 17:20
-
You could perhaps use something cryptographically secure: http://crypto.stackexchange.com/questions/12436/what-is-the-difference-between-csprng-and-prng – Aske Doerge Dec 07 '15 at 17:22
-
2As an example of "much bigger deviations" to worry about, [in Python 2, `random.sample`-ing `xrange(1, sys.maxsize)` always returns a number congruent to 1 mod `2**10`.](http://stackoverflow.com/questions/33602014/random-number-in-the-range-1-to-sys-maxsize-is-always-1-mod-210) Also, for people suggesting cryptographic PRNGs, even *those* don't have the period to produce all possible permutations, and even the cryptographers still don't care. – user2357112 Dec 07 '15 at 17:31
-
@user2357112 Yeah the `1 mod 2**10` seems like a much bigger worry since I am only sampling a few hundred permutations. Luckily, the conversion to Python 3 should be easy since I already import from future. – Darwin Dec 07 '15 at 18:45
-
Would permuting the previous permutation instead of the original list give you 2^19937-1 different permutations times 2^19937-1 different begin states? (Just thinking out loud, correct me if I'm wrong) – m69's been on strike for years Dec 07 '15 at 21:03
-
1@m69 No, it would just give you a different set of 2^19937. If you permuted the second time with a different PRNG, you might in the best case add their periods (if there's no interaction between them). – Lee Daniel Crocker Dec 08 '15 at 01:40
2 Answers
There are longer-period PRNGs than MT, but they are hard to find.
To get all 3090! combinations, you need 40,905 bits of entropy. That's about 5kb. You should be able to grab a chunk of bytes that size from someplace like random.org many times with no problem. To get precisely balanced, you'll have to add some and do rejection sampling. I.e., grab 12 bits at a time (0..4095), and reject numbers higher than your current loop index. That might inflate the number of bits needed, but probably not beyond 8kb.

- 12,927
- 3
- 29
- 55
I agree with @user2357112 that it is unlikely to be a genuine issue -- but it seems like you should be able to use the standard random
module in such a way that all permutations are at least possible.
You could do a divide-and-conquer approach. Use the initial seed to partition the list into 2 lists of around 2000 each. The number of such partitions is roughly C(4000,2000)
which is approximately 1.66 x 10^1202
. This is less that the period, which suggests that it is at least possible for all such partitions to be generated with random.sample()
. Then - reseed the random number generator and permute the first half. Then -- reseed a second time and permute the second half. Perhaps throw in little time delays before the reseedings so you don't run into issues involving the resolution of your system clock. You could also experiment in randomly partitioning the initial list into larger numbers of smaller lists.
Mathematically, it is easy to see that if you randomly partition a list into sublists so that each partition is equally likely and then you permute each sublist in such a way that all sublist permutations are equally likely, and glue together these sublist permutations to get a whole-list permutation, then all whole-list permutations are equally likely.
Here is an implementation:
import random, time
def permuted(items, pieces = 2):
sublists = [[] for i in range(pieces)]
for x in items:
sublists[random.randint(0,pieces-1)].append(x)
permutedList = []
for i in range(pieces):
time.sleep(0.01)
random.seed()
random.shuffle(sublists[i])
permutedList.extend(sublists[i])
return permutedList
I'm not sure that time.sleep(0.01)
is really needed. My concern was that if the reseeds happened within a millisecond then on some systems the same seed might be used.
As a final remark, just because the above function (with suitable choice of pieces
) can't be shown to miss certain permutations by a simple counting argument (comparing the number of permutations with the number of initial states) this doesn't in and of itself constitute proof that all permutations are in fact possible. That would require a more detailed analysis of the random number generator, the hash function that seeds it, and the shuffle algorithm.

- 51,337
- 7
- 54
- 119
-
This seems to be a very good solution. Could you provide a code example for randomly partitioning the original list? – Darwin Dec 09 '15 at 00:15