5

My google-fu has failed me! I have a 10x10 numpy array initialized to 0 as follows:

arr2d = np.zeros((10,10))

For each row in arr2d, I want to assign 3 random columns to 1. I am able to do it using a loop as follows:

for row in arr2d:
    rand_cols = np.random.randint(0,9,3)
    row[rand_cols] = 1

output:

array([[ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,  1.,  0.],
   [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  1.,  0.],
   [ 0.,  0.,  1.,  0.,  1.,  1.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  1.,  1.,  1.,  0.,  0.,  0.],
   [ 1.,  0.,  0.,  1.,  1.,  0.,  0.,  0.,  0.,  0.],
   [ 1.,  0.,  1.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.,  1.,  0.],
   [ 0.,  0.,  1.,  0.,  1.,  0.,  0.,  0.,  1.,  0.],
   [ 1.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,  0.,  0.],
   [ 0.,  1.,  0.,  0.,  1.,  0.,  0.,  1.,  0.,  0.]])

Is there a way to exploit numpy or array indexing/slicing to achieve the same result in a more pythonic/elegant way (preferably in 1 or 2 lines of code)?

Divakar
  • 218,885
  • 19
  • 262
  • 358
codemaniac
  • 879
  • 1
  • 11
  • 31

3 Answers3

2

Once you have the arr2d initialized with arr2d = np.zeros((10,10)), you can use a vectorized approach with a two-liner like so -

# Generate random unique 3 column indices for 10 rows
idx = np.random.rand(10,10).argsort(1)[:,:3]

# Assign them into initialized array
arr2d[np.arange(10)[:,None],idx] = 1

Or cramp in everything for a one-liner if you like it that way -

arr2d[np.arange(10)[:,None],np.random.rand(10,10).argsort(1)[:,:3]] = 1

Sample run -

In [11]: arr2d = np.zeros((10,10))  # Initialize array

In [12]: idx = np.random.rand(10,10).argsort(1)[:,:3]

In [13]: arr2d[np.arange(10)[:,None],idx] = 1

In [14]: arr2d # Verify by manual inspection
Out[14]: 
array([[ 0.,  1.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  1.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  1.,  0.,  1.],
       [ 0.,  1.,  1.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  1.,  1.,  0.,  0.,  0.,  1.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  1.,  0.,  0.,  0.,  1.,  0.,  1.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  1.,  0.],
       [ 1.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  1.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  1.]])

In [15]: arr2d.sum(1) # Verify by counting ones in each row
Out[15]: array([ 3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.])

Note : If you are looking for performance, I would suggest going with a np.argpartition based approach as listed in this other post.

Community
  • 1
  • 1
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • Really cool. There's a whole lot of cleverness packed into these two lines of code. – zarak Aug 22 '16 at 18:59
  • 2
    @zarak The original idea came from this post - http://stackoverflow.com/a/29156976/3293881. The speedups against a loopy approach are listed here : http://stackoverflow.com/a/31958263/3293881 – Divakar Aug 22 '16 at 19:05
1

Use answers from this question to generate non-repeating random numbers. You can use random.sample from Python's random module, or np.random.choice.

So, just a small modification to your code:

>>> import numpy as np
>>> for row in arr2d:
...     rand_cols = np.random.choice(range(10), 3, replace=False)
...     # Or the python standard lib alternative (use `import random`)
...     # rand_cols = random.sample(range(10), 3)
...     row[rand_cols] = 1
...
>>> arr2d
array([[ 0.,  0.,  0.,  0.,  0.,  1.,  1.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  1.,  1.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  1.,  0.],
       [ 0.,  0.,  1.,  1.,  0.,  0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  1.,  1.],
       [ 0.,  0.,  1.,  1.,  0.,  0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  1.,  1.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  1.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  1.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  1.,  1.,  0.,  0.,  0.,  0.,  1.,  0.]])

I don't think you can really leverage column slicing here to set values to 1, unless you're generating the randomized array from scratch. This is because your column indices are random for each row. You're better off leaving it in the form of a loop for readability.

Community
  • 1
  • 1
Praveen
  • 6,872
  • 3
  • 43
  • 62
  • 1
    FYI: You can generate non-repeating random numbers using `numpy.random.choice(10, size=3, replace=False)`. This is described in one of the answers to the question that you linked. – Warren Weckesser Aug 20 '16 at 04:55
  • @WarrenWeckesser I did notice, but I didn't include it because it was the second result. I'll add it as an alternative. Thanks! – Praveen Aug 20 '16 at 14:49
  • 1
    In fact, in retrospect, it's probably better to just use `np.random` in order to avoid having two very similar imports, which could get pretty confusing. – Praveen Aug 20 '16 at 14:54
0

I'm not sure how good this would be in terms of performance, but it's fairly concise.

arr2d[:, :3] = 1
map(np.random.shuffle, arr2d)
zarak
  • 2,933
  • 3
  • 23
  • 29