5

This recent question about sorting randomly using C# got me thinking about the way I've sometimes shuffled my arrays in Perl.

@shuffled = sort { rand() <=> rand() } @array;

The proposed solution in the mentioned question is Fisher-Yates shuffle, which works in a linear time.

The question is: how efficient is my snippet and is such shuffle "really" random?

Community
  • 1
  • 1
Tuminoid
  • 9,445
  • 7
  • 36
  • 51
  • Thanks for everyone confirming what I had suspected myself as well; it is stretching sort badly and its not correct way to shuffle an array. I've never used it in a critical spot (rarely shuffling is critical anyhow), but as a quick hack for testing. But still in some twisted way its elegant ;) – Tuminoid Dec 17 '08 at 19:41
  • That new title brian put changes the nature and the purpose of the discussion, rolling it back. – Tuminoid Dec 18 '08 at 09:56
  • brian changed the title because this is a very FAQ on Perl mailing lists and #perl on IRC, so frequently asked that perlfaq4 includes the answer. – converter42 Dec 18 '08 at 17:15

8 Answers8

9

I'm not a Perl internals expert, so I don't know how "sort" will work here. However, most sort functions expect consistency in their comparisons, and I would expect them to work unpredictably if the function is itself random. Unfortunately, unpredictability is not the same as randomness, and so I'd have no confidence in your shuffled array. It may tend to put elements in some sort of order, just as a hastily created and complicated recurrence relation may not be random.

Rather than analyze the sort function, I'd suggest going with Fisher-Yates.

As Knuth said, randomness is too important to be left to chance.

David Thornley
  • 56,304
  • 9
  • 91
  • 158
9
$ perldoc List::Util
⋮
  shuffle LIST
       Returns the elements of LIST in a random order

           @cards = shuffle 0..51      # 0..51 in a random order
⋮

That would be my suggestion.

derobert
  • 49,731
  • 15
  • 94
  • 124
8

I'm actually a bit surprised that your proposed shuffle works. In the implementation of the Perl sort function, it tries to get the elements of the array into ascending order depending on the value of your comparison function. The problem is, your comparison function doesn't return consistent answers! Sometimes it might say "foo" lt "bar", while other times it might say "bar" lt "foo". This has the possibility of confusing the sorting algorithm to the point where it never terminates, or terminates with a fatal error, or some other catastrophic failure.

Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
4

The perl documentation on sort says this

The comparison function is required to behave. If it returns inconsistent results (sometimes saying $x[1] is less than $x[2] and sometimes saying the opposite, for example) the results are not well-defined.

So it's a bad idea to do that.

ETA: I just did a benchmark. On a 100000 element array, using a FY-shuffle is more than 10 times faster too.

Leon Timmermans
  • 30,029
  • 2
  • 61
  • 110
4

For one thing, you know that no matter the comparator you use sort() can't possibly be faster than O(n log n). So even if the shuffle it performs is fair, its performance will be worse.

So is the shuffle fair? It's obviously not fair for some (easy to analyze) sorting algorithms. Consider a simple bubble sort - in order for an element to move from one end to the other, the comparison function has to evaluate positive for n consecutive calls - a 1 in 2^n probability for what should be a 1 in n event. For quick sort, it's hard to analyze and it's possible that ends up being fair. But if it's important that it be right, do it the right way.

bmm6o
  • 6,187
  • 3
  • 28
  • 55
3

This is just intuition, but I think that using a sort like this will produce a set whose order is to some degree dependent on the order of the original set. The results of a truly random sort should not depend at all on the order of the original set. I can't explain why/how, maybe somebody else can (or show that it is in fact random)?

As to how efficient it is, I'm not sure, but it's probably not a whole lot less efficient than any other sort using sort, since AFAIK rand() is relatively cheap. I could be wrong there, though.

Adam Bellaire
  • 108,003
  • 19
  • 148
  • 163
  • I've gathered just as much as you, but they're just hunches as well. I was hoping someone would, at least pretend to, know :) – Tuminoid Dec 17 '08 at 18:13
  • @Tim: I don't think the rand() <=> rand() is a shuffle, it is not a correct algorithm. Its results are unpredictable, but not random. That aside, you are re-iterating what I stated. If you are using a truly random sort (a shuffle), then initial ordering is not relevant. – Adam Bellaire Dec 17 '08 at 18:31
3

There's a better Fisher-Yates shuffle function that doesn't use the sort builtin in the perlfaq4: How do I shuffle an array randomly?.

Powerlord
  • 87,612
  • 17
  • 125
  • 175
brian d foy
  • 129,424
  • 31
  • 207
  • 592
  • Kudos for pointing to Perl's awesome documentation. I might add that perldoc -q shuffle from the command line is another way to find the same information. – converter42 Dec 17 '08 at 23:14
0

@shuffled = map {
  $_->[1]
} sort {
  $a->[0] <=> $b->[0]
} map {
  [ rand(), $_ ]
} @array;
PP.
  • 10,764
  • 7
  • 45
  • 59