1

I am using this code to generate a random permutation of a vector using a variation of Fisher-Yates randomization algorithm (I am going from the first element to the last, not the other way around). I am using a boost::random::mt11213b RNG globally in a program that is seeded with generator.seed(time(NULL)); when program starts, hence a wrapper singleton RandomNumber here.

boost::random::uniform_int_distribution<unsigned long> 
    distribution(0, vec.size()-1);

for (unsigned long i=0;i<vec.size();i++)
    std::swap(vec[i], vec[distribution(RandomNumber::getInstance().generator)]);

Some experiments have led me to believe that there may be an issue in this algorithm, in a nutshell. This is what I did

  1. Created a vector of integer with length 100
  2. Filled first 75 elements with 0 and the last 25 with 1
  3. Shuffled an array.
  4. Took the first 5 elements from the list and summed them.

I repeated this procedure a few thousand times (with a loop, not by hand :)) each time starting with a fresh vector. Then I computed the arithmetic mean of the sums and it came about 0.98 instead of the expected 1.25.

The funny thing is that if I start with a vector that has been shuffled once with the same algorithm instead of an ordered one, the result increases to 1.22 and if I don't discard the vector on each iteration but rather just shuffle it again, the result is around 1.25that is the expected value.

I am unsure as to what could be wrong. The algorithm looks sound, the only thing that I can think of that could have gone wrong is the seeding phase and the

boost::random::uniform_int_distribution<unsigned long> 
    distribution(0, vec.size()-1);

line that is called each time before a vector is shuffled (perhaps it should only be called once a program but that doesn't make mush sense)

Any help will be greatly appreciated!

  • 1
    If I had to take a guess as to the cause, you're not changing the distribution size each time around the loop. The *Art of Computer Programming* algorithm is [here](http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle#The_modern_algorithm). – OmnipotentEntity Apr 23 '12 at 19:17
  • That wouldn't make sense. The vector is of a fixed length. –  Apr 23 '12 at 19:19
  • 1
    Yes, but once you shuffle up to n elements, you don't want to touch the first n again, because repeated applications of pseudo random numbers don't making things more random, they make them less random. – OmnipotentEntity Apr 23 '12 at 19:21
  • You list the steps taken in the experiment - it sounds like the code (or at least code that performs exactly the same as your description) should be short enough to post. That way OmnipotentEntity wouldn't have to guess (but with a name like that, I'm surprised there's any guesswork involved anyway :) ). – Michael Burr Apr 23 '12 at 19:22
  • Oh Jesus, I completely misunderstood the algorithm when I first read it. Thank you, you can post an answer, so I can accept. –  Apr 23 '12 at 19:22
  • Does it function correctly? I don't want to post an answer unless it does. And I'm too lazy to test it myself. ;) – OmnipotentEntity Apr 23 '12 at 19:23
  • It would be most impossible to give the actual code as it is written in a custom programming language that I am writing an interpreter for :) –  Apr 23 '12 at 19:24

2 Answers2

3

If I had to take a guess as to the cause, you're not changing the distribution size each time around the loop. The Art of Computer Programming algorithm is here.

Once you shuffle up to n elements, you don't want to touch the first n again, because repeated applications of pseudo random numbers don't making things more random, they make them less random.

OmnipotentEntity
  • 16,531
  • 6
  • 62
  • 96
2

No your algorithm is wrong. Consider the simple case with a vector of 4 numbers, your algorithm returns the following biased results

result      probability * 256
{1,2,3,4}   10
{1,2,4,3}   10
{1,3,2,4}   10
{1,3,4,2}   14
{1,4,2,3}   11
{1,4,3,2}   9
{2,1,3,4}   10
{2,1,4,3}   15
{2,3,1,4}   14
{2,3,4,1}   14
{2,4,1,3}   11
{2,4,3,1}   11
{3,1,2,4}   11
{3,1,4,2}   11
{3,2,1,4}   9
{3,2,4,1}   11
{3,4,1,2}   11
{3,4,2,1}   10
{4,1,2,3}   8
{4,1,3,2}   9
{4,2,1,3}   9
{4,2,3,1}   8
{4,3,1,2}   10
{4,3,2,1}   10

While a standard Fisher-Yates algorithm will give a uniform probability for all results.

If you want to shuffle a vector, use std::random_shuffle directly (see Using boost::random as the RNG for std::random_shuffle for some example codes).

Community
  • 1
  • 1
kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
  • Yes, as OmnipotentEntity pointed out, I failed at implementing the algorithm correctly. –  Apr 23 '12 at 19:33