Shuffling NumPy array along a given axis

Question

Given the following NumPy array,

> a = array([[1, 2, 3, 4, 5], [1, 2, 3, 4, 5],[1, 2, 3, 4, 5]])

it's simple enough to shuffle a single row,

> shuffle(a[0])
> a
array([[4, 2, 1, 3, 5],[1, 2, 3, 4, 5],[1, 2, 3, 4, 5]])

Is it possible to use indexing notation to shuffle each of the rows independently? Or do you have to iterate over the array. I had in mind something like,

> numpy.shuffle(a[:])
> a
array([[4, 2, 3, 5, 1],[3, 1, 4, 5, 2],[4, 2, 1, 3, 5]]) # Not the real output

though this clearly doesn't work.

score 29 · Answer 1 · edited Jun 20 '20 at 09:12

Vectorized solution with `rand+argsort` trick

We could generate unique indices along the specified axis and index into the the input array with advanced-indexing. To generate the unique indices, we would use random float generation + sort trick, thus giving us a vectorized solution. We would also generalize it to cover generic n-dim arrays and along generic axes with np.take_along_axis. The final implementation would look something like this -

def shuffle_along_axis(a, axis):
    idx = np.random.rand(*a.shape).argsort(axis=axis)
    return np.take_along_axis(a,idx,axis=axis)

Note that this shuffle won't be in-place and returns a shuffled copy.

Sample run -

In [33]: a
Out[33]: 
array([[18, 95, 45, 33],
       [40, 78, 31, 52],
       [75, 49, 42, 94]])

In [34]: shuffle_along_axis(a, axis=0)
Out[34]: 
array([[75, 78, 42, 94],
       [40, 49, 45, 52],
       [18, 95, 31, 33]])

In [35]: shuffle_along_axis(a, axis=1)
Out[35]: 
array([[45, 18, 33, 95],
       [31, 78, 52, 40],
       [42, 75, 94, 49]])

Interesting solution! However I made a quick experiment and it was way slower (on the order of 1000x) then the naiive solution below which repeatedly invokes rng.shuffle. Can anyone confirm this? Why is it so slow? — Nils, Mar 24 '22 at 13:42
@Nils I am not sure the naive solution you are referring is still here but an explanation would be that rng.shuffle shuffles only does in-place shuffling (O(n) time complexity). For this solution you have to allocate memory for the unique indices, do sorting with argsort (O(nlogn) time complexity), and then you have to allocate new memory for the result as well. Thus the naive solution scales better for large arrays. — Naphat Amundsen, Dec 22 '22 at 10:37

Sven Marnach · Accepted Answer · 2020-05-26T12:31:17.617

24

You have to call numpy.random.shuffle() several times because you are shuffling several sequences independently. numpy.random.shuffle() works on any mutable sequence and is not actually a ufunc. The shortest and most efficient code to shuffle all rows of a two-dimensional array a separately probably is

list(map(numpy.random.shuffle, a))

Some people prefer to write this as a list comprehension instead:

[numpy.random.shuffle(x) for x in a]

edited May 26 '20 at 12:31

answered Feb 18 '11 at 17:15

Sven Marnach

574,206
118
941
841

Thanks, simple and clean solution. – lafras Feb 21 '11 at 11:22
at least for python 3.5, numpy 1.10.2, this doesn't work, a remains unchanged. – drevicko Mar 16 '16 at 17:22
@drevicko: What dimension does your array have? This answer is for shuffling all rows of a two-dimensional array (and I'm sure it also works with your combination of Python and Numpy versions). – Sven Marnach Mar 16 '16 at 22:12
1

Aha! I see what happened: in Python 3.5, map is lazy, producing an iterator, and doesn't do the mapping until you iterate through it. If you do e.g.: `for _ in map(...): pass` it'll work. – drevicko Mar 21 '16 at 15:40
1

@drevicko That makes sense. It might be best to write that code as `for x in a: numpy.random.shuffle(x)` then. – Sven Marnach Mar 21 '16 at 15:57
I guess so.. You do get a view when you iterate over `a`, don't you? There's also a messy one-liner: `list(map(...))` if `a` isn't too big, but a for loop starts to look more attractive ;) – drevicko Mar 21 '16 at 16:09
@drevicko The for loop basically does the same as `map()`: it uses Python's iterator protocol to iterate over `a`. It calls `a.__iter__()` to retrieve an iterator for `a`, and then calls the `__next__()` method on that iterator until `StopIteration` is raised. In this particular case, with `a` being a two-dimensional Numpy array, the items returned by the `__next__()` method are indeed views for the respective rows. In case of a one-dimensional array, you'd simply get the values of the elements. – Sven Marnach Mar 21 '16 at 21:11
or `[*map(numpy.random.shuffle, a)]` to be simpler. – Frost-Lee May 26 '20 at 11:57

score 5 · Answer 3 · answered Jan 06 '23 at 15:21

For those looking at this question more recently, numpy provides the permuted method to shuffle an array independently along the specified axis.

From their documentation (using random.Generator)

rng = np.random.default_rng()
x = np.arange(24).reshape(3, 8)
x
array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23]])

y = rng.permuted(x, axis=1)
y
array([[ 4,  3,  6,  7,  1,  2,  5,  0],  
       [15, 10, 14,  9, 12, 11,  8, 13],
       [17, 16, 20, 21, 18, 22, 23, 19]])

Great answer and exactly what I was looking for - this is the canonical way to do this now. — Praveen, Apr 21 '23 at 21:07

Shuffling NumPy array along a given axis

3 Answers3

Vectorized solution with `rand+argsort` trick

Linked

Shuffling NumPy array along a given axis

3 Answers3

Vectorized solution with rand+argsort trick

Linked

Vectorized solution with `rand+argsort` trick