2

I have 2 numpy arrays of different sizes. Theoretically, the one will be a subset of the other. I want to find the indexes in the larger numpy array where its values match with the smaller subset.

For e.g

A = [ 7.52   8.32  16.96  20.05 -24.96 -42.69 -47.47  55.04 -57.62   2.03
  61.94  64.41 -71.3   93.6  151.65 151.75  -0.43  -3.18   4.59  -5.55
   6.44  -9.48   9.31   0.67 -14.34  -8.09  16.23  17.69  19.46  23.52
 -52.59]

B = [61.94 16.23 19.46 -5.55 -0.43 93.6]

2 for loops will do the deed, but I want to know if there is a python way to do this faster.

I tried with one loop but it does not work ( I suspect the numpy.where does not work with different size array)

    def get_index(self, lst_1, lst_2):
        tmp_list = list()
        for i in range(min(len(lst_1), len(lst_2))):
            if np.where(lst_2[i] == lst_1):
                tmp_list.append(i)

        return tmp_list

Any suggestions will be appreciated :)

Thank you

Max Voitko
  • 1,542
  • 1
  • 17
  • 32
blackbug
  • 1,098
  • 3
  • 13
  • 40

1 Answers1

5

You can use np.in1d to check which indices match, and you will get a boolean array.

>>> np.in1d(A,B)
array([False, False, False, False, False, False, False, False, False,
       False,  True, False, False,  True, False, False,  True, False,
       False,  True, False, False, False, False, False, False,  True,
       False,  True, False, False])

Then you can do the following, to get the actual indices:

>>> np.arange(A.shape[0])[np.in1d(A,B)]
array([10, 13, 16, 19, 26, 28])

Note: This performs very fast for large arrays, and easy to check the opposite. Either np.in1d(A,B,invert=True) or np.arange(A.shape[0])[~np.in1d(A,B)]

EDIT: As suggested in the comments, a blatantly conspicuous way (which I missed, God knows why?!) to get the indices: np.nonzero(np.in1d(A,B))

Sayandip Dutta
  • 15,602
  • 4
  • 23
  • 52