How is random state updated after calling np.random.shuffle()? and how is it impacted by the length of the list to be shuffled?

Question

How is the random state updated after calling np.random.shuffle()? and how is it impacted by the length of the list to be shuffled?

Here is the experiment:

np.random.seed(3)
print(np.random.get_state()[0],str(np.random.get_state()[1][:5]))
delta = 7
for t in [3,7,8, 9, 10]:
    print('-'*20)
    x = np.power(2, t)
    np.random.seed(3)
    a =np.arange(x-delta)
    np.random.shuffle(a)
    print(x-delta, np.random.get_state()[0],str(np.random.get_state()[1][:5]))
    np.random.seed(3)
    a =np.arange(x)
    np.random.shuffle(a)
    print(x, np.random.get_state()[0],str(np.random.get_state()[1][:5]))
    np.random.seed(3)
    a =np.arange(x+delta)
    np.random.shuffle(a)
    print(x+delta, np.random.get_state()[0],str(np.random.get_state()[1][:5]))

and the result:

MT19937 [         3 1142332464 3889748055 3734916391 3619205944]
--------------------
1 MT19937 [         3 1142332464 3889748055 3734916391 3619205944]
8 MT19937 [2266350226  522119106 3046352735  732669494 2548320174]
15 MT19937 [2266350226  522119106 3046352735  732669494 2548320174]
--------------------
121 MT19937 [2266350226  522119106 3046352735  732669494 2548320174]
128 MT19937 [2266350226  522119106 3046352735  732669494 2548320174]
135 MT19937 [2266350226  522119106 3046352735  732669494 2548320174]
--------------------
249 MT19937 [2266350226  522119106 3046352735  732669494 2548320174]
256 MT19937 [2266350226  522119106 3046352735  732669494 2548320174]
263 MT19937 [2266350226  522119106 3046352735  732669494 2548320174]
--------------------
505 MT19937 [3210938781 3041878801 2995991318 2989044749 4131327847]
512 MT19937 [3210938781 3041878801 2995991318 2989044749 4131327847]
519 MT19937 [3210938781 3041878801 2995991318 2989044749 4131327847]
--------------------
1017 MT19937 [2643427254 2135041851 1650564992  768318449  937622320]
1024 MT19937 [2643427254 2135041851 1650564992  768318449  937622320]
1031 MT19937 [2643427254 2135041851 1650564992  768318449  937622320]

Thank you very much.

Severin Pappadeux · Accepted Answer · 2018-03-25T20:44:45.233

Quickly looked into NumPy 0.14.x sources. Apparently Fisher-Yates is used, in-place version, you could take a look at Efficiently yield elements from large list in (pseudo) random order how algorithm works. The only difference I see is that NumPy is doing shuffle in reverse order (going from n down to 1).

So number of calls to integer in-range RNG is equal to array length. But here is the catch - to get that integer internal number generator might be called more than once, there is a while loop here.

So, moral of the story - to shuffle array of size n, RNG would be called >= n times.

UPDATE

Here what interval RNG looks like (Win10, x64):

unsigned long rk_interval(unsigned long max, rk_state *state) {
    unsigned long mask = max, value;

    if (max == 0) {
        return 0;
    }
    /* Smallest bit mask >= max */
    mask |= mask >> 1;
    mask |= mask >> 2;
    mask |= mask >> 4;
    mask |= mask >> 8;
    mask |= mask >> 16;

    /* Search a random value in [0..mask] <= max */
    while ((value = (rk_ulong(state) & mask)) > max);

    return value;
}

rk_ulong(state) is pretty much Mersenne Twister in some wrapping

You see, there is a mask build, but masked values could be outside max, so while loop is necessary, and this is what makes # of calls to MT >= number of items in the array to shuffle.

It make sense that the times RNG is called = size of the list. But when the list is small, (<263 in my example) the times RNG is called does not seem equal to the size of the list — user7586189, Mar 25 '18 at 15:41

How is random state updated after calling np.random.shuffle()? and how is it impacted by the length of the list to be shuffled?

1 Answers1