How to efficiently shuffle some values of a numpy array while keeping their relative order?

Question

I have a numpy array and a mask specifying which entries from that array to shuffle while keeping their relative order. Let's have an example:

In [2]: arr = np.array([5, 3, 9, 0, 4, 1])

In [4]: mask = np.array([True, False, False, False, True, True])

In [5]: arr[mask]
Out[5]: array([5, 4, 1]) # These entries shall be shuffled inside arr, while keeping their order.

In [6]: np.where(mask==True)
Out[6]: (array([0, 4, 5]),)

In [7]: shuffle_array(arr, mask)  # I'm looking for an efficient realization of this function!
Out[7]: array([3, 5, 4, 9, 0, 1]) # See how the entries 5, 4 and 1 haven't changed their order.

I've written some code that can do this, but it's really slow.

import numpy as np
def shuffle_array(arr, mask):
    perm = np.arange(len(arr))  # permutation array
    n = mask.sum()
    if n > 0:
        old_true_pos = np.where(mask == True)[0]  # old positions for which mask is True
        old_false_pos = np.where(mask == False)[0] # old positions for which mask is False

        new_true_pos = np.random.choice(perm, n, replace=False)  # draw new positions
        new_true_pos.sort()
        new_false_pos = np.setdiff1d(perm, new_true_pos)

        new_pos = np.hstack((new_true_pos, new_false_pos))
        old_pos = np.hstack((old_true_pos, old_false_pos))
        perm[new_pos] = perm[old_pos]

    return arr[perm]

To make things worse, I actually have two large matrices A and B with shape (M,N). Matrix A holds arbitrary values, while each row of matrix B is the mask which to use for shuffling one corresponding row of matrix A according to the procedure that I outlined above. So what I want is shuffled_matrix = row_wise_shuffle(A, B).

The only way I have so far found to do it is via my shuffle_array() function and a for loop.

Can you think of any numpy'onic way to accomplish this task avoiding loops? Thank you so much in advance!

armamut · Accepted Answer · 2021-01-18T12:22:06.640

For 1d case:

import numpy as np

a = np.arange(8)
b = np.array([1,1,1,1,0,0,0,0])
# Get ordered values
ordered_values = a[np.where(b==1)]
# We'll shuffle both arrays
shuffled_ix = np.random.permutation(a.shape[0])
a_shuffled = a[shuffled_ix]
b_shuffled = b[shuffled_ix]
# Replace the values with correct order
a_shuffled[np.where(b_shuffled==1)] = ordered_values
a_shuffled # Notice that 0, 1, 2, 3 preserves order.

>>>
array([0, 1, 2, 6, 3, 4, 7, 5])

for 2d case, columnwise shuffle (along axis=1):


import numpy as np

a = np.arange(24).reshape(4,6)
b = np.array([[0,0,0,0,1,1], [1,1,1,0,0,0], [1,1,1,1,0,0], [0,0,1,1,0,0]])

# The code below works for column shuffle (i.e. axis=1).
# Get ordered values
i,j = np.where(b==1)
values = a[i, j]
values

# We'll shuffle both arrays for axis=1
# taken from https://stackoverflow.com/questions/5040797/shuffling-numpy-array-along-a-given-axis
idx = np.random.rand(*a.shape).argsort(axis=1)
a_shuffled = np.take_along_axis(a,idx,axis=1)
b_shuffled = np.take_along_axis(b,idx,axis=1)

# Replace the values with correct order
a_shuffled[np.where(b_shuffled==1)] = values

# Get the result
a_shuffled # see that 4,5 | 6,7,8 | 12,13,14,15 | 20, 21 preserves order
>>>
array([[ 4,  1,  0,  3,  2,  5],
       [ 9,  6,  7, 11,  8, 10],
       [12, 13, 16, 17, 14, 15],
       [23, 20, 19, 22, 21, 18]])

for 2d case, rowwise shuffle (along axis=0), we can use the same code, first transpose arrays and after shuffle transpose back:


import numpy as np

a = np.arange(24).reshape(4,6)
b = np.array([[0,0,0,0,1,1], [1,1,1,0,0,0], [1,1,1,1,0,0], [0,0,1,1,0,0]])

# The code below works for column shuffle (i.e. axis=1).
# As you said rowwise, we first transpose
at = a.T
bt = b.T

# Get ordered values
i,j = np.where(bt==1)
values = at[i, j]
values

# We'll shuffle both arrays for axis=1
# taken from https://stackoverflow.com/questions/5040797/shuffling-numpy-array-along-a-given-axis
idx = np.random.rand(*at.shape).argsort(axis=1)
at_shuffled = np.take_along_axis(at,idx,axis=1)
bt_shuffled = np.take_along_axis(bt,idx,axis=1)

# Replace the values with correct order
at_shuffled[np.where(bt_shuffled==1)] = values

# Get the result
a_shuffled = at_shuffled.T
a_shuffled # see that 6,12 | 7, 13 | 8,14,20 | 15, 21 preserves order
>>>
array([[ 6,  7,  2,  3, 10, 17],
       [18, 19,  8, 15, 16, 23],
       [12, 13, 14, 21,  4,  5],
       [ 0,  1, 20,  9, 22, 11]])

How to efficiently shuffle some values of a numpy array while keeping their relative order?

1 Answers1