1

How is the random state updated after calling np.random.shuffle()? and how is it impacted by the length of the list to be shuffled?

Here is the experiment:

np.random.seed(3)
print(np.random.get_state()[0],str(np.random.get_state()[1][:5]))
delta = 7
for t in [3,7,8, 9, 10]:
    print('-'*20)
    x = np.power(2, t)
    np.random.seed(3)
    a =np.arange(x-delta)
    np.random.shuffle(a)
    print(x-delta, np.random.get_state()[0],str(np.random.get_state()[1][:5]))
    np.random.seed(3)
    a =np.arange(x)
    np.random.shuffle(a)
    print(x, np.random.get_state()[0],str(np.random.get_state()[1][:5]))
    np.random.seed(3)
    a =np.arange(x+delta)
    np.random.shuffle(a)
    print(x+delta, np.random.get_state()[0],str(np.random.get_state()[1][:5]))

and the result:

MT19937 [         3 1142332464 3889748055 3734916391 3619205944]
--------------------
1 MT19937 [         3 1142332464 3889748055 3734916391 3619205944]
8 MT19937 [2266350226  522119106 3046352735  732669494 2548320174]
15 MT19937 [2266350226  522119106 3046352735  732669494 2548320174]
--------------------
121 MT19937 [2266350226  522119106 3046352735  732669494 2548320174]
128 MT19937 [2266350226  522119106 3046352735  732669494 2548320174]
135 MT19937 [2266350226  522119106 3046352735  732669494 2548320174]
--------------------
249 MT19937 [2266350226  522119106 3046352735  732669494 2548320174]
256 MT19937 [2266350226  522119106 3046352735  732669494 2548320174]
263 MT19937 [2266350226  522119106 3046352735  732669494 2548320174]
--------------------
505 MT19937 [3210938781 3041878801 2995991318 2989044749 4131327847]
512 MT19937 [3210938781 3041878801 2995991318 2989044749 4131327847]
519 MT19937 [3210938781 3041878801 2995991318 2989044749 4131327847]
--------------------
1017 MT19937 [2643427254 2135041851 1650564992  768318449  937622320]
1024 MT19937 [2643427254 2135041851 1650564992  768318449  937622320]
1031 MT19937 [2643427254 2135041851 1650564992  768318449  937622320]

Thank you very much.

user7586189
  • 1,249
  • 8
  • 17

1 Answers1

2

Quickly looked into NumPy 0.14.x sources. Apparently Fisher-Yates is used, in-place version, you could take a look at Efficiently yield elements from large list in (pseudo) random order how algorithm works. The only difference I see is that NumPy is doing shuffle in reverse order (going from n down to 1).

So number of calls to integer in-range RNG is equal to array length. But here is the catch - to get that integer internal number generator might be called more than once, there is a while loop here.

So, moral of the story - to shuffle array of size n, RNG would be called >= n times.

UPDATE

Here what interval RNG looks like (Win10, x64):

unsigned long rk_interval(unsigned long max, rk_state *state) {
    unsigned long mask = max, value;

    if (max == 0) {
        return 0;
    }
    /* Smallest bit mask >= max */
    mask |= mask >> 1;
    mask |= mask >> 2;
    mask |= mask >> 4;
    mask |= mask >> 8;
    mask |= mask >> 16;

    /* Search a random value in [0..mask] <= max */
    while ((value = (rk_ulong(state) & mask)) > max);

    return value;
}

rk_ulong(state) is pretty much Mersenne Twister in some wrapping

You see, there is a mask build, but masked values could be outside max, so while loop is necessary, and this is what makes # of calls to MT >= number of items in the array to shuffle.

Severin Pappadeux
  • 18,636
  • 3
  • 38
  • 64
  • It make sense that the times RNG is called = size of the list. But when the list is small, (<263 in my example) the times RNG is called does not seem equal to the size of the list – user7586189 Mar 25 '18 at 15:41