1

I am trying to create a copy of my numpy array that contains only certain values. This is the code I was using:

A = np.array([[1,2,3],[4,5,6],[7,8,9]])
query_val = 5
B = (A == query_val) * np.array(query_val, dtype=np.uint16)

... which does exactly what I want.

Now, I'd like query_val to be more than just one value. The answer here: Numpy where function multiple conditions suggests using a logical and operation, but that's very space inefficient because you use == several times, creating multiple intermediate results.

In my case, that means I don't have enough RAM to do it. Is there a way to do this properly in native numpy with minimal space overhead?

fns
  • 11
  • 2

1 Answers1

0

Here's one approach using np.searchsorted -

def mask_in(a, b):
    idx = np.searchsorted(b,a)
    idx[idx==b.size] = 0
    return np.where(b[idx]==a, a,0)

Sample run -

In [356]: a
Out[356]: 
array([[5, 1, 4],
       [4, 5, 6],
       [2, 4, 9]])

In [357]: b
Out[357]: array([2, 4, 5])

In [358]: mask_in(a,b)
Out[358]: 
array([[5, 0, 4],
       [4, 5, 0],
       [2, 4, 0]])
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • Thanks. Though, if I see this right, this needs to allocate many times the space of the full-size array (1. a, 2. idx, 3. idx==b.size, 4. b[idx], 5. b[idx]==a, 6. np.where(...)) while my code needs 2 full-size allocations. – fns May 23 '17 at 21:07
  • @fns We can easily write back to the results to the input array with the mask `b[idx]==a` if you don't want a copy as output. So, do you want a copy or write back? – Divakar May 23 '17 at 21:15
  • @fns Other than that, we have have memory occupancy for `idx` and `b[idx]`. Since, we have duplicates in `a`, there won't be an easy solution to it. If memory is super critical for you, I think you should stick to a loopy solution. – Divakar May 23 '17 at 21:20