42

I have an array below:

a=np.array([0.1, 0.2, 0.3, 0.7, 0.8, 0.9])

What I want is to convert this vector to a binary vector based on a threshold. take threshold=0.5 as an example, element that greater than 0.5 convert to 1, otherwise 0.
The output vector should like this:

a_output = [0, 0, 0, 1, 1, 1]

How can I do this?

cs95
  • 379,657
  • 97
  • 704
  • 746
freefrog
  • 685
  • 1
  • 8
  • 15
  • 1
    Does this answer your question? [Replacing Numpy elements if condition is met](https://stackoverflow.com/questions/19766757/replacing-numpy-elements-if-condition-is-met) – Georgy May 06 '20 at 14:00

2 Answers2

81

np.where

np.where(a > 0.5, 1, 0)
# array([0, 0, 0, 1, 1, 1])

Boolean basking with astype

(a > .5).astype(int)
# array([0, 0, 0, 1, 1, 1])

np.select

np.select([a <= .5, a>.5], [np.zeros_like(a), np.ones_like(a)])
# array([ 0.,  0.,  0.,  1.,  1.,  1.])

Special case: np.round

This is the best solution if your array values are floating values between 0 and 1 and your threshold is 0.5.

a.round()
# array([0., 0., 0., 1., 1., 1.])
Community
  • 1
  • 1
cs95
  • 379,657
  • 97
  • 704
  • 746
  • Which one is fastest for working with dtype 'uint8'? The `np.select` approach returns a 'uint8' by default but looks a lot more involved. – user3613932 May 26 '19 at 01:31
  • If you wish to get a Numpy array of dtype 'bool' then `a > 0.5` will give you that. – user3613932 May 26 '19 at 01:34
  • @user3613932 I wrote this answer a while back but if I'm not mistaken you can get a bool mask with a > .5 and then call .view to change the view to uint8 which should be faster than astype... Take a look and let me know if it doesn't work. – cs95 May 26 '19 at 01:35
  • Based on [official documentation](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.view.html) not sure how safe `view` is compared to `astype`. `astype` creates a new copy whereas `view` leverages the memory of the original variable. – user3613932 May 26 '19 at 05:04
0

You could use binarize from the sklearn.preprocessing module.

However this will work only if you want your final values to be binary i.e. '0' or '1'. The answers provided above are great of non-binary results as well.

from sklearn.preprocessing import binarize

a = np.array([0.1, 0.2, 0.3, 0.7, 0.8, 0.9]).reshape(1,-1)
x = binarize(a) 
a_output = np.ravel(x)
print(a_output) 

#everything together 
a_output = np.ravel(binarize(a.reshape(1,-1), 0.5))
conflicted_user
  • 329
  • 3
  • 4