Replace all elements of NumPy array that are greater than some value

Question

I have a 2D NumPy array. How do I replace all values in it greater than a threshold T = 255 with a value x = 255? A slow for-loop based method would be:

# arr = arr.copy()  # Optionally, do not modify original arr.

for i in range(arr.shape[0]):
    for j in range(arr.shape[1]):
        if arr[i, j] > 255:
            arr[i, j] = x

For more information, take a look at [this intro to indexing](http://docs.scipy.org/doc/numpy/user/basics.indexing.html). — askewchan, Oct 29 '13 at 19:25

score 459 · Accepted Answer · edited Apr 27 '19 at 22:52

459

I think both the fastest and most concise way to do this is to use NumPy's built-in Fancy indexing. If you have an ndarray named arr, you can replace all elements >255 with a value x as follows:

arr[arr > 255] = x

I ran this on my machine with a 500 x 500 random matrix, replacing all values >0.5 with 5, and it took an average of 7.59ms.

In [1]: import numpy as np
In [2]: A = np.random.rand(500, 500)
In [3]: timeit A[A > 0.5] = 5
100 loops, best of 3: 7.59 ms per loop

edited Apr 27 '19 at 22:52

kmario23

57,311
13
161
150

answered Oct 29 '13 at 18:46

mdml

22,442
8
58
66

4

Note that this modifies the existing array `arr`, instead of creating a `result` array as in the OP. – askewchan Oct 29 '13 at 20:01
2

Is there a way to do this by not modifying `A` but creating a new array? – sodiumnitrate Aug 25 '15 at 23:12
1

What would we do, if we wanted to change values at indexes which are multiple of given n, like a[2],a[4],a[6],a[8]..... for n=2? – lavee_singh Oct 07 '15 at 19:01
100 loops, best of 3: 2.22 ms per loop – dreab Sep 28 '16 at 20:31
6

NOTE: this doesn't work if the data is in a python list, it HAS to be in a numpy array (`np.array([1,2,3]`) – mjp May 08 '17 at 14:28
@mdml np.place method is the faster than this. timeit A[A>0.5] = 5 :- 1.79 ms ± 6.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) and timeit np.place(A, A>0, 5) :- 732 µs ± 5.89 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) – Divyang Vashi Jul 28 '18 at 19:40
Is there a way to modify this if `arr` includes NaN values? – Darcy Oct 10 '19 at 17:55
2

is it possible to use this indexing to update every value without condition? I want to do this: `array[ ? ] = x`, setting every value to x. Secondly, is it possible to do multiple conditions like: `array[ ? ] = 255 if array[i] > 127 else 0` I want to optimize my code and am currently using list comprehension which was dramatically slower than this fancy indexing. – AgentM Oct 17 '19 at 15:33
For not modifying the original array, do a deep copy of the original array. arr2 = arr.copy() and then arr2[arr2 > 255] = x – Debjit Bhowmick Jun 11 '20 at 02:03
For massive arrays, this solution will likely not be workable as it creates an intermediate array in-memory equal in size to the input array. If you do not have sufficient memory on your system it will fail. – corvus Oct 29 '21 at 16:50
@askewchan answer of using `result = np.minimum(arr, 255)` is the best for performance in my test. – Muhammad Yasirroni Jan 31 '22 at 14:14

score 62 · Answer 2 · edited Aug 19 '23 at 23:16

If you want a new array result containing a copy of arr whenever arr < 255, and 255 otherwise:

result = np.minimum(arr, 255)

More generally, for a lower and/or upper bound:

result = np.clip(arr, 0, 255)

If you just want to access the values over 255, or something more complicated, @mtitan8's answer is more general, but np.clip and np.minimum (or np.maximum) are nicer and much faster for your case:

In [292]: timeit np.minimum(a, 255)
100000 loops, best of 3: 19.6 µs per loop

In [293]: %%timeit
   .....: c = np.copy(a)
   .....: c[a>255] = 255
   .....: 
10000 loops, best of 3: 86.6 µs per loop

If you want to do it in-place (i.e., modify arr instead of creating result) you can use the out parameter of np.minimum:

np.minimum(arr, 255, out=arr)

or

np.clip(arr, 0, 255, arr)

(the out= name is optional since the arguments in the same order as the function's definition.)

For in-place modification, the boolean indexing speeds up a lot (without having to make and then modify the copy separately), but is still not as fast as minimum:

In [328]: %%timeit
   .....: a = np.random.randint(0, 300, (100,100))
   .....: np.minimum(a, 255, a)
   .....: 
100000 loops, best of 3: 303 µs per loop

In [329]: %%timeit
   .....: a = np.random.randint(0, 300, (100,100))
   .....: a[a>255] = 255
   .....: 
100000 loops, best of 3: 356 µs per loop

For comparison, if you wanted to restrict your values with a minimum as well as a maximum, without clip you would have to do this twice, with something like

np.minimum(a, 255, a)
np.maximum(a, 0, a)

or,

a[a>255] = 255
a[a<0] = 0

Thank you very much for your complete comment, however np.clip and np.minimum do not seem to be what I need in this case, in the OP you see that the threshold T and the replacement value (255) are not necessarily the same number. However I still gave you an up vote for thoroughness. Thanks again. — NLi10Me, Oct 30 '13 at 03:31
What would we do, if we wanted to change values at indexes which are multiple of given n, like a[2],a[4],a[6],a[8]..... for n=2? — lavee_singh, Oct 07 '15 at 19:01
@lavee_singh, to do that, you can use the third part of the slice, which is usually neglected: `a[start:stop:step]` gives you the elements of the array from `start` to `stop`, but instead of every element, it takes only every `step` (if neglected, it is `1` by default). So to set all the evens to zero, you could do `a[::2] = 0` — askewchan, Oct 08 '15 at 03:02
Thanks I needed something, like this, even though I knew it for simple lists, but I didn't know whether or how it works for numpy.array. — lavee_singh, Oct 08 '15 at 06:48
Surprisingly in my investigation, `a = np.maximum(a,0)` is faster than `np.maximum(a,0,out=a)`. — Muhammad Yasirroni, Jan 31 '22 at 14:12

score 22 · Answer 3 · edited Jan 09 '17 at 21:30

22

I think you can achieve this the quickest by using the where function:

For example looking for items greater than 0.2 in a numpy array and replacing those with 0:

import numpy as np

nums = np.random.rand(4,3)

print np.where(nums > 0.2, 0, nums)

edited Jan 09 '17 at 21:30

Bart

9,825
5
47
73

answered Jan 09 '17 at 21:13

Amir F

2,431
18
12

Shital Shah · Answer 4 · 2020-05-21T09:02:09.843

16

Another way is to use np.place which does in-place replacement and works with multidimentional arrays:

import numpy as np

# create 2x3 array with numbers 0..5
arr = np.arange(6).reshape(2, 3)

# replace 0 with -10
np.place(arr, arr == 0, -10)

edited May 21 '20 at 09:02

answered Dec 21 '17 at 06:57

Shital Shah

63,284
17
238
185

This is the solution I used because it was the first I came across. I wonder if there is a big difference between this and the selected answer above. What do you think? – jonathanking Feb 18 '18 at 16:44
In my very limited tests, my above code with np.place is running 2X slower than accepted answer's method of direct indexing. It's surprising because I would have thought np.place would be more optimized but I guess they have probably put more work on direct indexing. – Shital Shah Jun 28 '18 at 09:30
1

In my case `np.place` was also slower compared to the built-in method, although the opposite is claimed in [this](https://stackoverflow.com/questions/19666626/replace-all-elements-of-python-numpy-array-that-are-greater-than-some-value#comment90115025_19666680) comment. – riyansh.legend May 20 '20 at 07:37

lev · Answer 5 · 2016-05-07T10:24:23.937

15

You can consider using numpy.putmask:

np.putmask(arr, arr>=T, 255.0)

Here is a performance comparison with the Numpy's builtin indexing:

In [1]: import numpy as np
In [2]: A = np.random.rand(500, 500)

In [3]: timeit np.putmask(A, A>0.5, 5)
1000 loops, best of 3: 1.34 ms per loop

In [4]: timeit A[A > 0.5] = 5
1000 loops, best of 3: 1.82 ms per loop

edited May 07 '16 at 10:24

answered May 07 '16 at 10:05

lev

2,877
23
22

1

I have tested the code for when upper limit `0.5` used instead of `5`, and `indexing` was better than `np.putmask` about two times. – Ali_Sh Dec 25 '21 at 22:51

score 9 · Answer 6 · edited Mar 23 '18 at 19:34

9

You can also use &, | (and/or) for more flexibility:

values between 5 and 10: A[(A>5)&(A<10)]

values greater than 10 or smaller than 5: A[(A<5)|(A>10)]

edited Mar 23 '18 at 19:34

Dmitriy

5,525
12
25
38

answered Mar 23 '18 at 19:16

Mahdi Shahbaba

499
6
4

score 6 · Answer 7 · answered Feb 01 '22 at 20:09

6

np.where() works great!

np.where(arr > 255, 255, arr)

example:

FF = np.array([[0, 0],
              [1, 0],
              [0, 1],
              [1, 1]])
np.where(FF == 1, '+', '-')
Out[]: 
array([['-', '-'],
       ['+', '-'],
       ['-', '+'],
       ['+', '+']], dtype='<U1')

answered Feb 01 '22 at 20:09

dougeemetcalf

61
1
1

np.where is a great solution, it doesn't mutate the arrays involved, and it's also directly compatible with pandas series objects. Really helped me. – AndrewJaeyoung Mar 21 '22 at 23:04

score 4 · Answer 8 · edited Jul 10 '21 at 19:35

Lets us assume you have a numpy array that has contains the value from 0 all the way up to 20 and you want to replace numbers greater than 10 with 0

import numpy as np

my_arr = np.arange(0,21) # creates an array
my_arr[my_arr > 10] = 0 # modifies the value

Note this will however modify the original array to avoid overwriting the original array try using arr.copy() to create a new detached copy of the original array and modify that instead.

import numpy as np

my_arr = np.arange(0,21)
my_arr_copy = my_arr.copy() # creates copy of the orignal array

my_arr_copy[my_arr_copy > 10] = 0

Replace all elements of NumPy array that are greater than some value

8 Answers8

Linked

Related