4

For a list of 10 ints, there are 10! possible orders or permutations. Why does random.shuffle give duplicates after only 5000 tries?

>>> L = range(10)
>>> rL = list()
>>> for i in range(5000):
...     random.shuffle(L)
...     rL.append(L[:])
... 
>>> rL = [tuple(e) for e in rL]
>>> len(set(rL))
4997
>>> for i,t in enumerate(rL):
...     if rL.count(t) > 1:
...         print i,t
... 
102 (7, 5, 2, 4, 0, 6, 9, 3, 1, 8)
258 (1, 4, 0, 2, 7, 3, 5, 9, 6, 8)
892 (1, 4, 0, 2, 7, 3, 5, 9, 6, 8)
2878 (7, 5, 2, 4, 0, 6, 9, 3, 1, 8)
4123 (5, 8, 0, 1, 7, 3, 2, 4, 6, 9)
4633 (5, 8, 0, 1, 7, 3, 2, 4, 6, 9)
>>> 10*9*8*7*6*5*4*3*2
3628800
>>> 2**19937 - 1
431542479738816264805523551633791983905393 [snip]

>>> L = list()
>>> for i in range(5000):
...     L.append(random.choice(xrange(3628800)))
... 
>>> len(set(L))
4997

Edit: FWIW, if the probability of not having two the same for a single pair is: p = (10! - 1) / 10! and the number of combinations is: C = 5000! / 4998! * 2! = 5000 * 4999 / 2 then the probability of having a duplicate is:

>>> import math
>>> f = math.factorial(10)
>>> p = 1.0*(f-1)/f
>>> C = 5000.0*4999/2
>>> 1 - p**C
0.96806256495611798
Riking
  • 2,389
  • 1
  • 24
  • 36
telliott99
  • 7,762
  • 4
  • 26
  • 26
  • 4
    http://xkcd.com/221/ that's your answer :) – Madi D. Jan 23 '10 at 21:18
  • 4
    Due to the Birthday Paradox (google it), there is a high probability for dups among only N**0.5 items chosen from N possibilities. Here N=10! which predicts dups after (10!)**0.5 ~= 1900 tries. See http://stackoverflow.com/questions/2124347/how-to-generate-permutations-of-array-in-python/2124365#2124365 for code that avoids repetitions. – Beni Cherniavsky-Paskin Jan 23 '10 at 21:27
  • Thanks, all. I actually know the Birthday Paradox but discounted it. I didn't appreciate the N**0.5 dependence to get a 50% chance of a dup. – telliott99 Jan 23 '10 at 21:50

3 Answers3

20

It's called the Birthday Paradox.

According to this formula from Wikipedia:

but replacing 365 with 10! you would only need about 2200 examples to have a 50% chance of a collision, and you are way above that.

Community
  • 1
  • 1
Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • 3
    which, if you crunch the numbers, shows that there's only about a 3% chance of getting 5000 distinct values when choosing 5000 from a set of 3628800. so there's a 97% chance that when you construct a set from your results you'd get something less than 5000. – Autoplectic Jan 23 '10 at 21:35
6

Because it's... random! If you want all permutations just use itertools.permutations.

Alex Gaynor
  • 14,353
  • 9
  • 63
  • 113
2

maybe because is RANDOM? Random does not mean does not repeat, it means it is RANDOM, which means theoretically it could return the exact same answer every time, not likely but possible.