Why does random.shuffle fail on numpy lists?

Question

I have an array of row vectors, upon which I run random.shuffle:

#!/usr/bin/env python                                                                                                                                                                                                                                                

import random
import numpy as np

zzz = np.array([[0.1, 0.2, 0.3, 0.4, 0.5],
                [0.6, 0.7, 0.8, 0.9, 1. ]])

iterations = 100000
f = 0
for _ in range(iterations):
    random.shuffle(zzz)
    if np.array_equal(zzz[0], zzz[1]):
        print(zzz)
        f += 1

print(float(f)/float(iterations))

Between 99.6 and 100% of the time, using random.shuffle on zzz returns a list with the same elements in it, e.g.:

$ ./test.py
...
[[ 0.1  0.2  0.3  0.4  0.5]
 [ 0.1  0.2  0.3  0.4  0.5]]
0.996

Using numpy.random.shuffle appears to pass this test and shuffle row vectors correctly. I'm curious to know why random.shuffle fails.

You should be giving `random.shuffle` a list, e.g. `zl = list(zzz)` or `zl = zzz.tolist()`. Don't count on a Python function that designed for a list, to handle a 2d array correctly, especially when dealing with in-place changes. — hpaulj, Feb 10 '20 at 00:44

abc · Accepted Answer · 2020-02-10T00:59:32.127

If you look at the code of random.shuffle it performs swaps in the following way:

x[i], x[j] = x[j], x[i]

which for a numpy.array would fail, without raising any error. Example:

>>> zzz[1], zzz[0] = zzz[0], zzz[1]
>>> zzz
array([[0.1, 0.2, 0.3, 0.4, 0.5],
       [0.1, 0.2, 0.3, 0.4, 0.5]])

The reason is that Python first evaluates the right hand side completely and then make the assignment (this is why with Python single line swap is possible) but for a numpy array this is not True.

numpy

>>> arr = np.array([[1],[1]])
>>> arr[0], arr[1] = arr[0]+1, arr[0]
>>> arr
array([[2],
       [2]])

Python

>>> l = [1,1]
>>> l[0], l[1] = l[0]+1, l[0]
>>> l
[2, 1]

This was exactly the kind of answer I was hoping to see, which makes the bug clear. Thanks! — Alex Reynolds, Feb 10 '20 at 00:56

score 0 · Answer 2 · answered Feb 10 '20 at 00:40

Try it like this :

#!/usr/bin/env python                                                                                                                                                                                                                                                

import random
import numpy as np

zzz = np.array([[0.1, 0.2, 0.3, 0.4, 0.5],
                [0.6, 0.7, 0.8, 0.9, 1. ]])

iterations = 100000
f = 0
for _ in range(iterations):
    random.shuffle(zzz[0])
    random.shuffle(zzz[1])
    if np.array_equal(zzz[0], zzz[1]):
        print(zzz)
        f += 1

print(float(f)/float(iterations))

Thanks, I'm not trying to shuffle elements within the row vector, but was curious why the behavior between the numpy and native Python libraries was different. — Alex Reynolds, Feb 10 '20 at 00:52

score 0 · Answer 3 · answered Feb 10 '20 at 00:59

In [200]: zzz = np.array([[0.1, 0.2, 0.3, 0.4, 0.5], 
     ...:                 [0.6, 0.7, 0.8, 0.9, 1. ]]) 
     ...:                                                                                      
In [201]: zl = zzz.tolist()                                                                    
In [202]: zl                                                                                   
Out[202]: [[0.1, 0.2, 0.3, 0.4, 0.5], [0.6, 0.7, 0.8, 0.9, 1.0]]

random.random is probably using an in-place assignment like:

In [203]: zzz[0],zzz[1]=zzz[1],zzz[0]                                                          
In [204]: zzz                                                                                  
Out[204]: 
array([[0.6, 0.7, 0.8, 0.9, 1. ],
       [0.6, 0.7, 0.8, 0.9, 1. ]])

Note the replication.

But applied to a list of lists:

In [205]: zl[0],zl[1]=zl[1],zl[0]                                                              
In [206]: zl                                                                                   
Out[206]: [[0.6, 0.7, 0.8, 0.9, 1.0], [0.1, 0.2, 0.3, 0.4, 0.5]]
In [207]: zl[0],zl[1]=zl[1],zl[0]                                                              
In [208]: zl                                                                                   
Out[208]: [[0.1, 0.2, 0.3, 0.4, 0.5], [0.6, 0.7, 0.8, 0.9, 1.0]]

I tested zl = list(zzz) and still got the array behavior. This zl is a list with views of zzz. tolist makes a list of lists thats totally independent ofzzz`.

In short random.random cannot handle inplace modifications of a ndarray correctly. np.random.shuffle is designed to work with the 1st dim of an array, so it gets it right.

correct assignment for ndarray is:

In [211]: zzz = np.array([[0.1, 0.2, 0.3, 0.4, 0.5], 
     ...:                 [0.6, 0.7, 0.8, 0.9, 1. ]]) 
     ...:                                                                                      
In [212]: zzz[[0,1]] = zzz[[1,0]]                                                              
In [213]: zzz                                                                                  
Out[213]: 
array([[0.6, 0.7, 0.8, 0.9, 1. ],
       [0.1, 0.2, 0.3, 0.4, 0.5]])
In [214]: zzz[[0,1]] = zzz[[1,0]]                                                              
In [215]: zzz                                                                                  
Out[215]: 
array([[0.1, 0.2, 0.3, 0.4, 0.5],
       [0.6, 0.7, 0.8, 0.9, 1. ]])

Why does random.shuffle fail on numpy lists?

3 Answers3