2

I am trying to use numpy.where to find the indices I want. Here's the code:

import numpy as np
a = np.array([20,58,32,0,107,57]).reshape(2,3)
item_index = np.where((a == 58) | (a == 107) | (a == 20))
print item_index

I get item_index as below:

(array([0, 0, 1]), array([0, 1, 1]))

However, in reality, the dimensions of a is 20000 x 7 and the conditions are several hundred instead of just three. Is there a way to use numpy.where for multiple conditions? I found topics here, here and here useful, but I couldn't find the answer to my question.

Community
  • 1
  • 1
ahoosh
  • 1,340
  • 3
  • 17
  • 31
  • I'd say the problem you're having isn't connected to `where`. The problem you're having is efficiently compressing several hundred equality conditions into one short, efficient condition. – user2357112 Jul 30 '14 at 03:23
  • @user2357112 I agree. I will likely to edit the title. In the solutions provided other users did not use `where` at all and mostly used `np.in1d` – ahoosh Jul 30 '14 at 03:46

3 Answers3

3

Given (per your example):

>>> a
array([[ 20,  58,  32],
       [  0, 107,  57]])

with the query, 'is an array element of a in a list of values', just use numpy.in1d:

>>> np.in1d(a, [58, 107, 20])
array([ True,  True, False, False,  True, False], dtype=bool)

If you want the indexes to be the same as the underlying array, just reshape to the shape of a:

>>> np.in1d(a, [58, 107, 20]).reshape(a.shape)
array([[ True,  True, False],
       [False,  True, False]], dtype=bool)

Then test against that:

>>> tests=np.in1d(a, [58, 107, 20]).reshape(a.shape)
>>> tests[1,1]                 # is the element of 'a' in the list [58, 107, 20]?
True

In one line (obvious, but I do not know if efficient for one off queries):

>>> np.in1d(a, [58, 107, 20]).reshape(a.shape)[1,1]
True
dawg
  • 98,345
  • 23
  • 131
  • 206
2

Someone better at numpy may have a better solution - but if you have pandas installed you could do something like this.

import pandas as pd
df = pd.DataFrame(a) # Create a pandas dataframe from array

conditions = [58, 107, 20]
item_index = df.isin(conditions).values.nonzero()

isin builds boolean array which is True is the value is in the conditions list. The call to .values extracts the underlying numpy array from the pandas DataFrame. The call to nonzero() converts bools to 1s and 0s.

chrisb
  • 49,833
  • 8
  • 70
  • 70
  • 1
    The same can be achieved in numpy alone using `np.in1d` and some magic with the indexing: `np.unravel_index(np.in1d(a, [58, 107, 20]).nonzero()[0], a.shape)` – Jaime Jul 30 '14 at 01:36
  • @chrisb I'm thinking to use `pandas` at some point and your solution would totally work using pandas. – ahoosh Jul 30 '14 at 03:43
  • @Jaime Solutions similar to your were provided by two other members. I tested it and it totally works. – ahoosh Jul 30 '14 at 03:44
2

Add another dimension to each so they can be broadcast against each other:

>>> 
>>> a = np.array([20,58,32,0,107,57]).reshape(2,3)
>>> b = np.array([58, 107, 20])
>>> np.any(a[...,np.newaxis] == b[np.newaxis, ...], axis = 2)
array([[ True,  True, False],
       [False,  True, False]], dtype=bool)
>>> 
wwii
  • 23,232
  • 7
  • 37
  • 77