Is there an idiom or API for synchronized shuffling of Python arrays?

Question

Is there an API in NumPy (or perhaps TensorFlow) for performing a synchronized shuffling of several arrays (with the same first dimension)?

For example, if I two arrays with dimensions (N, A) and (N, B), and I want to randomize the ordering of the N elements of each, while maintaining the association between the elements of the first array and the second.

Is there an API or Python idiom for accomplishing this?

Note that combining these into a single array of N tuples which are then shuffled with random.shuffle might be an option that I'd accept as an answer, but I can't get that to work: getting the original arrays back is (as near as I've managed) messy since combined_array[:,0] will have dimension (N,) with objects as elements, rather than dimension (N, A), unless it is manually rebuilt with something like [x for x in combined_array[:,0]

score 4 · Accepted Answer · answered Jul 27 '17 at 21:14

4

permutation = numpy.random.permutation(N)

arr1_shuffled = arr1[permutation]
arr2_shuffled = arr2[permutation]

Pick one permutation and use it for both arrays.

answered Jul 27 '17 at 21:14

user2357112

260,549
28
431
505

score 0 · Answer 2 · answered Jul 27 '17 at 21:18

An easy way to get around this, without having to touch or combine the original collections would be to use a randomized bijective function that maps to the indexes of the elements in the arrays. You could then use that function (mapping) and apply it to both (or any number really) arrays to get the shuffled result.

This general idea can be applied to really any kind of collection and is not limited to Numpy arrays.

The easiest way to do this would be to have a shuffled list of all indexes, and then, to iterate through the synchronized arrays randomly, you simply iterate through the shuffled index list and access the elements at the current index:

from random import shuffle

# These would be your input lists (or arrays, doesn’t really matter)
list1 = […]
list2 = […]

# generate a list of all indexes
indexes = list(range(len(list1)))

# shuffle the indexes
shuffle(indexes)

for i in indexes:
    print(list1[i], list2[i])

Of course you could also use this to create new lists/arrays with the new order.

Is there an idiom or API for synchronized shuffling of Python arrays?

2 Answers2

Linked