1

I have a numpy array consisting of a lot of 0s and a few non-zero entries e.g. like this (just a toy example):

myArray = np.array([[ 0.       ,  0.       ,  0.79],
       [ 0.       ,  0.       ,  0.       ],
       [ 0.       ,  0.       ,  0.       ],
       [ 0.       ,  0.435    ,  0.       ]])

Now I would like to move each of the non-zero entries with a given probability which means that some of the entries are moved, some might remain at the current position. Some of the rows are not allowed to contain a non-zero entry which means that values are not allowed to be moved there. I implemented that as follows:

import numpy as np

# for reproducibility
np.random.seed(2)

myArray = np.array([[ 0.       ,  0.       ,  0.79],
       [ 0.       ,  0.       ,  0.       ],
       [ 0.       ,  0.       ,  0.       ],
       [ 0.       ,  0.435    ,  0.       ]])

# list of rows where numbers are not allowed to be moved to   
ignoreRows = [2]

# moving probability
probMove =  0.3

# get non-zero entries
nzEntries = np.nonzero(myArray) 

# indices of the non-zero entries as tuples
indNZ = zip(nzEntries[0], nzEntries[1]) 

# store values
valNZ = [myArray[i] for i in indNZ] 

# generating probabilities for moving for each non-zero entry
lProb = np.random.rand(len(nzEntries)) 

allowedRows = [ind for ind in xrange(myArray.shape[0]) if ind not in ignoreRows]  # replace by "range" in python 3.x
allowedCols = [ind for ind in xrange(myArray.shape[1])]  # replace by "range" in python 3.x

for indProb, prob in enumerate(lProb):
    # only move with a certain probability
    if prob <= probMove:
        # randomly change position
        myArray[np.random.choice(allowedRows), np.random.choice(allowedCols)] = valNZ[indProb]

        # set old position to zero
        myArray[indNZ[indProb]] = 0.

print myArray   

First, I determine all the indices and values of the non-zero entries. Then I assign a certain probability to each of these entries which determines whether the entry will be moved. Then I get the allowed target rows.

In the second step, I loop through the list of indices and move them according to their moving probability which is done by choosing from the allowed rows and columns, assigning the respective value to these new indices and set the "old" value to 0.

It works fine with the code above, however, speed really matters in this case and I wonder whether there is a more efficient way of doing this.

EDIT: Hpaulj's answer helped me to get rid of the for-loop which is nice and the reason why I accepted his answer. I incorporated his comments and posted an answer below as well, just in case someone else stumbles over this example and wonders how I used his answer in the end.

Cleb
  • 25,102
  • 20
  • 116
  • 151
  • you may adapt a [variation of Fisher-Yates-Knuth](http://stackoverflow.com/a/28491007/3591273) suffling to shufle only selected items and/or only with given probabillity (just add a dice simulation like: `if rand() > p: hten shuffle`) etc.. – Nikos M. May 31 '15 at 16:35
  • Thanks for the suggestion; I'll take a look. Have not heard of this method before. – Cleb May 31 '15 at 16:45
  • When you profiled your *actual* code, which lines did you find to be in need of performance boost? – boardrider May 31 '15 at 17:55
  • @boardrider: That is difficult to answer. Currently, I do not know how to improve the code, it can just take a lot of time for large arrays. I once had another example in which I used two for loops and someone converted the code to a loop-free version by using only matrix operations which I would have never seen myself; the run time went down from more than a minute to less than a second. Therefore, I was curious whether this code could be improved as well e.g. by using Python tools I am not aware of, by replacing the for-loop by using a fancy one-liner etc. – Cleb May 31 '15 at 18:11
  • Does this work (and improve speed): `valNZ=myArray[nzEntries]`? It's a minor step compared to the `lProb` loop. – hpaulj May 31 '15 at 18:16
  • @hpaulj: That works, thanks! But you are right, the main concern is this lProb loop. – Cleb May 31 '15 at 18:26
  • Your current code has a number of bugs. For example, `np.nonzero` doesn't work the way you're using it. Rather than producing a list of `(r, c)` coordinate pairs, it will produce a pair of arrays where the first array has all the `r` coordinates and the second array has all the `c` coordinates. Also, after addressing that, it's quite possible for your code to move a nonzero entry over another nonzero entry (either before or after the other nonzero entry is moved). This seems unlikely to be deliberate. – user2357112 May 31 '15 at 20:16
  • @user2357112: Thanks for your comment. You are right about np.nonzero. The pair of coordinates are created in this line: indNZ = zip(nzEntries[0],nzEntries[1]). Yes, non-zero entries can be overwritten; that's on purpose and happens for small arrays quite frequently. I might also need a version where that does not happen in the future; in case you would like to elaborate on how to avoid overwriting non-zero entries, feel free to do so :) – Cleb May 31 '15 at 20:47
  • I see the `indNZ` part now; the part I was looking at is `lProb = np.random.rand(len(nzEntries))`, which should probably be using `len(indNZ)` or `len(valNZ)`. If the overwriting is desired behavior, that makes it trickier to vectorize without resorting to something like Cython. – user2357112 May 31 '15 at 21:20
  • @user2357112 nzEntries = (array([0, 3]), array([2, 1])); indNZ = [(0, 2), (3, 1)]; valNZ= array([ 0.79 , 0.435]) - the lengths of the elements you mention are identical. Well, I do not want to exclude overwriting, it can happen, but does not have to. – Cleb May 31 '15 at 21:31
  • @Cleb: They're identical in *this* case, but stick another nonzero entry in the array and they won't be. `indNZ` and `valNZ` have length equal to the number of nonzero entries (so they're always the same length), but `nzEntries` has length equal to the dimension of the array. – user2357112 May 31 '15 at 21:32
  • Oh, indeed, my bad - thanks for pointing it out! – Cleb May 31 '15 at 21:35

2 Answers2

2

You can index elements with arrays, so:

valNZ=myArray[nzEntries]

can replace the zip and comprehension.

Simplify these 2 assignments:

allowedCols=np.arange(myArray.shape[1]);
allowedRows=np.delete(np.arange(myArray.shape[0]), ignoreRows)

With:

I=lProb<probMove; valNZ=valNZ[I];indNZ=indNZ[I]

you don't need to perform the prog<probMove test each time in the loop; just iterate over valNZ and indNZ.

I think your random.choice can be generated for all of these valNZ at once:

np.random.choice(np.arange(10), 10, True)  
# 10 choices from the range with replacement

With that it should be possible to move all of the points without a loop.

I haven't worked out the details yet.

There is one way in which your iterative move will be different from any parallel one. If a destination choice is another value, the iterative approach can over write, and possibly move a given value a couple of times. Parallel code will not perform the sequential moves. You have to decide whether one is correct or not.

There is a ufunc method, .at, that performs unbuffered operations. It works for operations like add, but I don't know if would apply to an indexing move like this.


simplified version of the iterative moving:

In [106]: arr=np.arange(20).reshape(4,5)
In [107]: I=np.nonzero(arr>10)
In [108]: v=arr[I]
In [109]: rows,cols=np.arange(4),np.arange(5)

In [110]: for i in range(len(v)):
    dest=(np.random.choice(rows),np.random.choice(cols))
    arr[dest]=v[i]
    arr[I[0][i],I[1][i]] = 0

In [111]: arr
Out[111]: 
array([[ 0, 18,  2, 14, 11],
       [ 5, 16,  7, 13, 19],
       [10,  0,  0,  0,  0],
       [ 0, 17,  0,  0,  0]])

possible vectorized version:

In [117]: dest=(np.random.choice(rows,len(v),True),np.random.choice(cols,len(v),True)) 
In [118]: dest
Out[118]: (array([1, 1, 3, 1, 3, 2, 3, 0, 0]), array([3, 0, 0, 1, 2, 3, 4, 0, 1]))

In [119]: arr[dest]
Out[119]: array([ 8,  5, 15,  6, 17, 13, 19,  0,  1])
In [120]: arr[I]=0
In [121]: arr[dest]=v

In [122]: arr
Out[122]: 
array([[18, 19,  2,  3,  4],
       [12, 14,  7, 11,  9],
       [10,  0,  0, 16,  0],
       [13,  0, 15,  0, 17]])

If I sets 0 after, there are more zeros.

In [124]: arr[dest]=v    
In [125]: arr[I]=0
In [126]: arr
Out[126]: 
array([[18, 19,  2,  3,  4],
       [12, 14,  7, 11,  9],
       [10,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0]])

same dest, but done iteratively:

In [129]: for i in range(len(v)):
   .....:     arr[dest[0][i],dest[1][i]] = v[i]
   .....:     arr[I[0][i],I[1][i]] = 0

In [130]: arr
Out[130]: 
array([[18, 19,  2,  3,  4],
       [12, 14,  7, 11,  9],
       [10,  0,  0, 16,  0],
       [ 0,  0,  0,  0,  0]])

With this small size, and high moving density, the differences between iterative and vectorized solutions are large. For a sparse array they would be fewer.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
1

Below you can find the code I came up with after incorporating hpaulj's answer and the answer from this question. This way, I got rid of the for-loop which improved the code a lot. Therefore, I accepted hpaulj's answer. Maybe the code below helps someone else in a similar situation.

import numpy as np
from itertools import compress

# for reproducibility
np.random.seed(2)

myArray = np.array([[ 0.       ,  0.2       ,  0.79],
       [ 0.       ,  0.       ,  0.       ],
       [ 0.       ,  0.       ,  0.       ],
       [ 0.       ,  0.435    ,  0.       ]])

# list of rows where numbers are not allowed to be moved to   
ignoreRows= []

# moving probability
probMove =  0.5

# get non-zero entries
nzEntries = np.nonzero(myArray) 

# indices of the non-zero entries as tuples
indNZ = zip(nzEntries[0],nzEntries[1]) 

# store values
valNZ = myArray[nzEntries]

# generating probabilities for moving for each non-zero entry
lProb = np.random.rand(len(valNZ)) 

# get the rows/columns where the entries are allowed to be moved
allowedCols = np.arange(myArray.shape[1]);
allowedRows = np.delete(np.arange(myArray.shape[0]), ignoreRows)

# get the entries that are actually moved
I = lProb < probMove
print I
# get the values of the entries that are moved
valNZ = valNZ[I]

# get the indices of the entries that are moved
indNZ = list(compress(indNZ, I))

# get the destination for the entries that are moved
dest = (np.random.choice(allowedRows, len(valNZ), True), np.random.choice(allowedCols, len(valNZ), True))
print myArray
print indNZ
print dest

# set the old indices to 0
myArray[zip(*indNZ)] = 0

# move the values to their respective destination
myArray[dest] = valNZ
print myArray
Community
  • 1
  • 1
Cleb
  • 25,102
  • 20
  • 116
  • 151