5

I have a numpy array and a list of valid values in that array:

import numpy as np
arr = np.array([[1,2,0], [2,2,0], [4,1,0], [4,1,0], [3,2,0], ... ])
valid = [1,4]

Is there a nice pythonic way to set all array values to zero, that are not in the list of valid values and do it in-place? After this operation, the list should look like this:

               [[1,0,0], [0,0,0], [4,1,0], [4,1,0], [0,0,0], ... ]

The following creates a copy of the array in memory, which is bad for large arrays:

arr = np.vectorize(lambda x: x if x in valid else 0)(arr)

It bugs me, that for now I loop over each array element and set it to zero if it is in the valid list.

Edit: I found an answer suggesting there is no in-place function to achieve this. Also stop changing my whitespaces. It's easier to see the changes in arr whith them.

Community
  • 1
  • 1
con-f-use
  • 3,772
  • 5
  • 39
  • 60

2 Answers2

3

You can use np.place for an in-situ update -

np.place(arr,~np.in1d(arr,valid),0)

Sample run -

In [66]: arr
Out[66]: 
array([[1, 2, 0],
       [2, 2, 0],
       [4, 1, 0],
       [4, 1, 0],
       [3, 2, 0]])

In [67]: np.place(arr,~np.in1d(arr,valid),0)

In [68]: arr
Out[68]: 
array([[1, 0, 0],
       [0, 0, 0],
       [4, 1, 0],
       [4, 1, 0],
       [0, 0, 0]])

Along the same lines, np.put could also be used -

np.put(arr,np.where(~np.in1d(arr,valid))[0],0)

Sample run -

In [70]: arr
Out[70]: 
array([[1, 2, 0],
       [2, 2, 0],
       [4, 1, 0],
       [4, 1, 0],
       [3, 2, 0]])

In [71]: np.put(arr,np.where(~np.in1d(arr,valid))[0],0)

In [72]: arr
Out[72]: 
array([[1, 0, 0],
       [0, 0, 0],
       [4, 1, 0],
       [4, 1, 0],
       [0, 0, 0]])
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • 1
    I suspect that would create a temporary copy as well since you reshape an array and multiply it by its original. So it fails my "in-place" requirement. You can see this by typing `arr` after the operation. It still gives you the original array with twos and threes. – con-f-use Jan 09 '16 at 22:32
  • You should really just make a new answer instead of editing your old one over and over. It makes discussion difficult. With `np.place(arr,~np.in1d(arr,valid),0)` the `in1d` part still creates a memory copy of the whole array, same with `np.put(arr,np.where(~np.in1d(arr,valid))[0],0)`. – con-f-use Jan 09 '16 at 22:48
  • @con-f-use As I understand, `~np.in1d(arr,valid)` would be a boolean array and as such would occupy less memory than a typical integer array, correct me if I am wrong here. And then it does in-situ update in `arr`, so there is no other copy involved there. – Divakar Jan 09 '16 at 22:50
  • Agreed, `np.zeroes((100),dtype=bool).nbytes` < `np.zeroes((100),dtype=float).nbytes`. But my problem is that `arr` already barely fits into memory on some target systems, and even an additionally array that is 1/8 smaller than the original, might cause problems. Also it's a matter of principal ;-) However, I thank you for all your time! I'm quiet convinced there is no in-place function and the loop is unavoidable. See my edit. – con-f-use Jan 09 '16 at 22:59
  • @con-f-use Yeah I guess that's the trade-off you need to make there, sorry! can't help it I guess :) Or maybe do it in chunks if you prefer a middle ground! – Divakar Jan 09 '16 at 23:04
  • All suggestions are 'in-place' in the sense that `arr` has the same data buffer pointer after the change. But they do all make a mask (temporary or not) of comparable size. To save both memory and time I think you need to use `nditer` in `cython` (or other C level coding). – hpaulj Jan 10 '16 at 03:06
  • If memory is in short supply for a basic operation like this, you do not have a useful computing environment - the mix of data, machine and language is wrong. – hpaulj Jan 10 '16 at 11:27
1

Indexing with booleans would work too:

>>> arr = np.array([[1, 2, 0], [2, 2, 0], [4, 1, 0], [4, 1, 0], [3, 2, 0]])
>>> arr[~np.in1d(arr, valid).reshape(arr.shape)] = 0
>>> arr
array([[1, 0, 0],
       [0, 0, 0],
       [4, 1, 0],
       [4, 1, 0],
       [0, 0, 0]])
Mike Müller
  • 82,630
  • 20
  • 166
  • 161