11

Here's a C implementation of Fisher-Yates that I want to use in a deck-shuffling routine. Am I doing this correctly (n = length of array)?

Note: The do-while loop attempts to correct for the modulo bias (see here). It adds a bit of overhead to the procedure and could be eliminated if you don't care about the low-bit bias.

void shuffle(int *array, int n) {

  int i, j, tmp, upper_bound;

  srand(time(NULL));

  for (i = n - 1; i > 0; i--) {

    upper_bound = RAND_MAX - ((RAND_MAX % (i + 1)) + 1);

    do {
      j = rand() % (i + 1);
    } while (j > upper_bound);

    tmp = array[j];
    array[j] = array[i];
    array[i] = tmp;   
  }
}
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
MikeRand
  • 4,788
  • 9
  • 41
  • 70
  • It just popped into my head that `int lim = RAND_MAX-i;` ... `} while (j>upper_bound && --lim);` might be a suitable way to catch the _it_ _can_ _never_ _happen_ case of repeated out of range random numbers. – nategoose Jul 27 '10 at 15:05

1 Answers1

28

First, you should extract the code for generating a random number that's equally distributed between 0 (inclusive) and n (exclusive) to a separate function. That's a nice task of work that you will need elsewhere, too.

Second, I would not call srand inside the shuffle function but depend on the caller on initializing the random number generator. That way you can shuffle a deck more than once in a second.

Third, you should do the test for j > upper_bound before dividing by i + 1. It's unlikely that i will ever be near RAND_MAX.

static int rand_int(int n) {
  int limit = RAND_MAX - RAND_MAX % n;
  int rnd;

  do {
    rnd = rand();
  } while (rnd >= limit);
  return rnd % n;
}

void shuffle(int *array, int n) {
  int i, j, tmp;

  for (i = n - 1; i > 0; i--) {
    j = rand_int(i + 1);
    tmp = array[j];
    array[j] = array[i];
    array[i] = tmp;
  }
}

To check whether this implementation may be correct, you need to ensure that you asked the random number generator for log2(n!) bits of randomness. In other words, the product of all the ns given to the rand_int function must be n!.

Roland Illig
  • 40,703
  • 10
  • 88
  • 121
  • +1 for the comment on seeding. It's also a "security" issue if external users of your code can predict/influence your shuffle by controlling timing. – Darron Jul 27 '10 at 21:41
  • What does it mean to have the caller initialize the random number? Is that like depending on the caller's mouse movements or something? – MikeRand Jul 29 '10 at 04:13
  • 4
    No. What it means is: Any experienced programmer would not expect a routine called `shuffle` to reset the random number generator to a specific state. That's just not included in the word `shuffle`. The only point where you should call `srand()` is in the `main` function. As a guide, just ask yourself: "What does this function do?" The answer for the `shuffle` function would be: "It shuffles the given array *and resets the random number generator*." This alone should sound weird enough. – Roland Illig Jul 29 '10 at 04:33
  • 1
    Eliminate need for 'tmp' by using std::swap in : ... j = rand_int(i+1); std::swap(array[i],array[j]); – KomodoDave Jul 07 '12 at 19:08
  • 3
    @KomodoDave This would save you a single unit of memory, but at the expense of the overhead and context switching involved in calling the `swap` function. Also, this appears to be pure C, not C++, so it likely isn't an option anyways. – Cloud Jun 04 '14 at 01:39
  • Note that to isolate this code from other random number generation, you might want to look into using `nrand48()`, and provide one method to set a random seed and another to generate a random seed automatically from `/dev/urandom` or something similar. This allows you to decouple the shuffling from any other random number generation in the program (but beware `lcong48()`). – Jonathan Leffler Jun 24 '15 at 22:41