3

I often come across the need for "shuffling" a python list in a manner similar to using the sorted function, i.e. return a shuffled copy of the list without mutating the original.

I know one can shallow-copy the list and then shuffle:

a = [random.randint(1, 1000) for _ in xrange(100)]
b = a[:]
random.shuffle(b)

The functional programmer in me wants a solution that is not mutative (and fitting into a single readable line is a big bonus), and so I want to use the sample function:

a = [random.randint(1, 1000) for _ in xrange(100)]
b = random.sample(a, len(a))

However, nowhere in the documentation can I find clues as to whether this is a good idea in terms of costs, both time- and space-wise, asymptotic or constant factor.

The post What is the difference between random.sample and random.shuffle in Python appears to be asking a similar if broader question, but a) does not have an accepted answer, and more importantly, b) none of the answers provide any mention of what the relative merits are in terms of time/space.

All 4 answers apart from the most-voted answer specifically talk about output, rather than any algorithmic complexity.

The Most Voted answer has the python source code for the two functions, and concludes:

the randomization is essentially done by the line int(random() * n). So, the underlying algorithm is essentially the same

I think this would suggest that output is the same, but in terms of time complexity, it doesn't really address the issue as the surrounding code is significantly different.

So my question is: what is the time/space cost difference, preferably both asymptotic and in terms of constant factors, between random.shuffle on a shallow copy, and random.sample(x, len(x))?

NB I recognize the title is slightly inaccurate because it doesn't make explicit the fact that I want a comparison of random.shuffle(x) where x is already a shallow copy. This is purely because of a need for brevity.

Community
  • 1
  • 1
zehnpaard
  • 6,003
  • 2
  • 25
  • 40
  • hmm, so it's marked a duplicate of a question I specifically referred to in my question as not answering my question, with reasons why? – zehnpaard Nov 02 '15 at 11:03
  • That question's answers specifically say that `sample` and `shuffle` have equivalent complexity because they work pretty much the same way. The only difference is that one is in-place and the other is not. – TigerhawkT3 Nov 02 '15 at 11:03
  • really? the code definitely *looks* very different - the `sample` code even has a clause for creating a `selected` set, altogether missing in `shuffle`. I would have thought that the fact that the two functions use the same randomization algorithm wouldn't entail time costs being the same, just the output. – zehnpaard Nov 02 '15 at 11:08
  • The complexity is the same, it is `O(n)`, but `sample` has higher constant factors because it needs to track the already visited items. If you need to shuffle the entire list, then use `shuffle` (on a copy). – Antti Haapala -- Слава Україні Nov 02 '15 at 12:35
  • @AnttiHaapala Many thanks, makes sense – zehnpaard Nov 04 '15 at 03:52

0 Answers0