Select cells randomly from NumPy array - without replacement

Question

I'm writing some modelling routines in NumPy that need to select cells randomly from a NumPy array and do some processing on them. All cells must be selected without replacement (as in, once a cell has been selected it can't be selected again, but all cells must be selected by the end).

I'm transitioning from IDL where I can find a nice way to do this, but I assume that NumPy has a nice way to do this too. What would you suggest?

Update: I should have stated that I'm trying to do this on 2D arrays, and therefore get a set of 2D indices back.

score 21 · Accepted Answer · answered Oct 08 '10 at 13:52

21

How about using numpy.random.shuffle or numpy.random.permutation if you still need the original array?

If you need to change the array in-place than you can create an index array like this:

your_array = <some numpy array>
index_array = numpy.arange(your_array.size)
numpy.random.shuffle(index_array)

print your_array[index_array[:10]]

answered Oct 08 '10 at 13:52

Wolph

78,177
11
137
148

Thanks for your answer. Looks like I should have mentioned in my question that this is 2D array...and I'd like to get the 2D array indices out for every cell, randomly without replacement. Is there a way to do that easily? I – robintw Oct 08 '10 at 14:16
2

@robintw - `numpy.random.shuffle` should work perfectly on n-dimensional arrays. If you want the indicies, you might try making row and column index arrays (look into `meshgrid`) and then shuffling them. – Joe Kington Oct 08 '10 at 14:25
@robintw: 2D arrays are no problem either, you can simply `reshape()` to get 2D instead of 1D :) – Wolph Oct 09 '10 at 21:17

score 6 · Answer 2 · answered Sep 08 '13 at 21:20

All of these answers seemed a little convoluted to me.

I'm assuming that you have a multi-dimensional array from which you want to generate an exhaustive list of indices. You'd like these indices shuffled so you can then access each of the array elements in a randomly order.

The following code will do this in a simple and straight-forward manner:

#!/usr/bin/python
import numpy as np

#Define a two-dimensional array
#Use any number of dimensions, and dimensions of any size
d=numpy.zeros(30).reshape((5,6))

#Get a list of indices for an array of this shape
indices=list(np.ndindex(d.shape))

#Shuffle the indices in-place
np.random.shuffle(indices)

#Access array elements using the indices to do cool stuff
for i in indices:
  d[i]=5

print d

Printing d verified that all elements have been accessed.

Note that the array can have any number of dimensions and that the dimensions can be of any size.

The only downside to this approach is that if d is large, then indices may become pretty sizable. Therefore, it would be nice to have a generator. Sadly, I can't think of how to build a shuffled iterator off-handedly.

score 2 · Answer 3 · edited May 23 '17 at 11:47

Extending the nice answer from @WoLpH

For a 2D array I think it will depend on what you want or need to know about the indices.

You could do something like this:

data = np.arange(25).reshape((5,5))

x, y  = np.where( a = a)
idx = zip(x,y)
np.random.shuffle(idx)

OR

data = np.arange(25).reshape((5,5))

grid = np.indices(data.shape)
idx = zip( grid[0].ravel(), grid[1].ravel() )
np.random.shuffle(idx)

You can then use the list idx to iterate over randomly ordered 2D array indices as you wish, and to get the values at that index out of the data which remains unchanged.

Note: You could also generate the randomly ordered indices via itertools.product too, in case you are more comfortable with this set of tools.

What is `a` in the first example? Also the expression `a=a` evaluates to `True` which can't be what you intend from a numpy `where` call (`numpy.where` takes in a masked array). Did you mean something like `x,y = np.where(data == data)`? — Hooked, Mar 29 '12 at 13:58

score 1 · Answer 4 · answered Aug 11 '13 at 22:19

1

people using numpy version 1.7 or later there can also use the builtin function numpy.random.choice

answered Aug 11 '13 at 22:19

ajeje

590
5
11

score 1 · Answer 5 · answered Oct 08 '10 at 17:36

Use random.sample to generates ints in 0 .. A.size with no duplicates, then split them to index pairs:

import random
import numpy as np

def randint2_nodup( nsample, A ):
    """ uniform int pairs, no dups:
        r = randint2_nodup( nsample, A )
        A[r]
        for jk in zip(*r):
            ... A[jk]
    """
    assert A.ndim == 2
    sample = np.array( random.sample( xrange( A.size ), nsample ))  # nodup ints
    return sample // A.shape[1], sample % A.shape[1]  # pairs


if __name__ == "__main__":
    import sys

    nsample = 8
    ncol = 5
    exec "\n".join( sys.argv[1:] )  # run this.py N= ...
    A = np.arange( 0, 2*ncol ).reshape((2,ncol))

    r = randint2_nodup( nsample, A )
    print "r:", r
    print "A[r]:", A[r]
    for jk in zip(*r):
        print jk, A[jk]

score 1 · Answer 6 · answered Jan 15 '11 at 07:29

Let's say you have an array of data points of size 8x3

data = np.arange(50,74).reshape(8,-1)

If you truly want to sample, as you say, all the indices as 2d pairs, the most compact way to do this that i can think of, is:

#generate a permutation of data's size, coerced to data's shape
idxs = divmod(np.random.permutation(data.size),data.shape[1])

#iterate over it
for x,y in zip(*idxs): 
    #do something to data[x,y] here
    pass

Moe generally, though, one often does not need to access 2d arrays as 2d array simply to shuffle 'em, in which case one can be yet more compact. just make a 1d view onto the array and save yourself some index-wrangling.

flat_data = data.ravel()
flat_idxs = np.random.permutation(flat_data.size)
for i in flat_idxs:
    #do something to flat_data[i] here
    pass

This will still permute the 2d "original" array as you'd like. To see this, try:

 flat_data[12] = 1000000
 print data[4,0]
 #returns 1000000

Select cells randomly from NumPy array - without replacement

6 Answers6

Linked