How do I select elements of an array given condition?

Question

Suppose I have a numpy array x = [5, 2, 3, 1, 4, 5], y = ['f', 'o', 'o', 'b', 'a', 'r']. I want to select the elements in y corresponding to elements in x that are greater than 1 and less than 5.

I tried

x = array([5, 2, 3, 1, 4, 5])
y = array(['f','o','o','b','a','r'])
output = y[x > 1 & x < 5] # desired output is ['o','o','a']

but this doesn't work. How would I do this?

score 268 · Accepted Answer · answered Jun 13 '10 at 00:50

268

Your expression works if you add parentheses:

>>> y[(1 < x) & (x < 5)]
array(['o', 'o', 'a'], 
      dtype='|S1')

answered Jun 13 '10 at 00:50

jfs

399,953
195
994
1,670

1

That is nice.. vecMask=1 – MasterControlProgram Nov 03 '16 at 17:08
13

@JennyYueJin: It happens because of precedence. (Bitwise) `&` has higher precedence than `<` and `>`, which in turn have higher precedence than (logical) `and`. `x > 1 and x < 5` evaulates the inequalities first and then the logical conjunction; `x > 1 & x < 5` evaluates the bitwise conjunction of `1` and (the values in) `x`, then the inequalities. `(x > 1) & (x < 5)` forces the inequalities to evaluate first, so all of the operations occur in the intended order and the results are all well-defined. [See docs here.](https://docs.python.org/3/reference/expressions.html#operator-precedence) – calavicci Nov 16 '17 at 17:58
@ru111 It works on Python 3.6 too (there is no reason for it to stop working). – jfs Dec 07 '17 at 20:13
I get "ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()" – ru111 Dec 07 '17 at 20:15
@ru111 you should write `(0 < x) & (x < 10)` (as shown in the answer) instead of `0 < x < 10` which doesn't work for numpy arrays on any Python version. – jfs Dec 08 '17 at 14:17
If someone is confused on why using the Bitwise & operator instead of a logical AND, here is a very good explanations: https://jakevdp.github.io/PythonDataScienceHandbook/02.06-boolean-arrays-and-masks.html – zardosht Apr 01 '20 at 10:08

Mark Mikofski · Answer 2 · 2017-08-09T05:52:27.963

43

IMO OP does not actually want np.bitwise_and() (aka &) but actually wants np.logical_and() because they are comparing logical values such as True and False - see this SO post on logical vs. bitwise to see the difference.

>>> x = array([5, 2, 3, 1, 4, 5])
>>> y = array(['f','o','o','b','a','r'])
>>> output = y[np.logical_and(x > 1, x < 5)] # desired output is ['o','o','a']
>>> output
array(['o', 'o', 'a'],
      dtype='|S1')

And equivalent way to do this is with np.all() by setting the axis argument appropriately.

>>> output = y[np.all([x > 1, x < 5], axis=0)] # desired output is ['o','o','a']
>>> output
array(['o', 'o', 'a'],
      dtype='|S1')

by the numbers:

>>> %timeit (a < b) & (b < c)
The slowest run took 32.97 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 1.15 µs per loop

>>> %timeit np.logical_and(a < b, b < c)
The slowest run took 32.59 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.17 µs per loop

>>> %timeit np.all([a < b, b < c], 0)
The slowest run took 67.47 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.06 µs per loop

so using np.all() is slower, but & and logical_and are about the same.

edited Aug 09 '17 at 05:52

answered Sep 05 '13 at 19:23

Mark Mikofski

19,398
2
57
90

7

You need to be a little careful about how you speak about what's evaluated. For example, in `output = y[np.logical_and(x > 1, x < 5)]`, `x < 5` *is* evaluated (possibly creating an enormous array), even though it's the second argument, because that evaluation happens outside of the function. IOW, `logical_and` gets passed two already-evaluated arguments. This is different from the usual case of `a and b`, in which `b` isn't evaluated if `a` is truelike. – DSM Sep 05 '13 at 19:29
16

there is no difference between bitwise_and() and logical_and() for boolean arrays – jfs Apr 13 '14 at 20:07
1

I've been searching ages for the 'or' alternative and this reply gave me some much needed relief! Thank you so much. (np.logical_or), OBVIOUSLY... – J.Massey Nov 26 '20 at 18:12
1

@J.Massey [a pipe `|` (_aka_ `np.bitwise_or`)](https://numpy.org/doc/stable/reference/generated/numpy.bitwise_or.html) might also work, _eg_: `(a < b) | (a > c)` – Mark Mikofski Dec 16 '21 at 17:16

Good Will · Answer 3 · 2018-12-26T20:24:32.577

25

Add one detail to @J.F. Sebastian's and @Mark Mikofski's answers:
If one wants to get the corresponding indices (rather than the actual values of array), the following code will do:

For satisfying multiple (all) conditions:

select_indices = np.where( np.logical_and( x > 1, x < 5) )[0] #   1 < x <5

For satisfying multiple (or) conditions:

select_indices = np.where( np.logical_or( x < 1, x > 5 ) )[0] # x <1 or x >5

edited Dec 26 '18 at 20:24

answered Nov 18 '14 at 16:03

Good Will

1,220
16
10

3

Note that numpy.where will not just return an array of the indices, but will instead return a tuple (the output of condition.nonzero()) containing arrays - in this case, `(the array of indices you want,)`, so you'll need `select_indices = np.where(...)[0]` to get the result you want and expect. – calavicci Nov 16 '17 at 18:11

score 6 · Answer 4 · 2017-11-16T21:02:37.430

6

I like to use np.vectorize for such tasks. Consider the following:

>>> # Arrays
>>> x = np.array([5, 2, 3, 1, 4, 5])
>>> y = np.array(['f','o','o','b','a','r'])

>>> # Function containing the constraints
>>> func = np.vectorize(lambda t: t>1 and t<5)

>>> # Call function on x
>>> y[func(x)]
>>> array(['o', 'o', 'a'], dtype='<U1')

The advantage is you can add many more types of constraints in the vectorized function.

Hope it helps.

edited Nov 16 '17 at 21:02

answered Nov 09 '17 at 06:45

1

This is not a good way to do indexing in NumPy (it will be very slow). – Alex Riley Sep 21 '19 at 14:18

score 1 · Answer 5 · edited Jun 29 '17 at 23:56

Actually I would do it this way:

L1 is the index list of elements satisfying condition 1;(maybe you can use somelist.index(condition1) or np.where(condition1) to get L1.)

Similarly, you get L2, a list of elements satisfying condition 2;

Then you find intersection using intersect(L1,L2).

You can also find intersection of multiple lists if you get multiple conditions to satisfy.

Then you can apply index in any other array, for example, x.

score 0 · Answer 6 · answered Feb 08 '19 at 21:56

For 2D arrays, you can do this. Create a 2D mask using the condition. Typecast the condition mask to int or float, depending on the array, and multiply it with the original array.

In [8]: arr
Out[8]: 
array([[ 1.,  2.,  3.,  4.,  5.],
       [ 6.,  7.,  8.,  9., 10.]])

In [9]: arr*(arr % 2 == 0).astype(np.int) 
Out[9]: 
array([[ 0.,  2.,  0.,  4.,  0.],
       [ 6.,  0.,  8.,  0., 10.]])

How do I select elements of an array given condition?

6 Answers6

Linked

Related