3

I am trying to find the most efficient way to get the indexes of nested arrays in another array.

import numpy as np
#                     0     1      2     3
haystack = np.array([[1,3],[3,4,],[5,6],[7,8]])
needles  = np.array([[3,4],[7,8]])

Given the arrays contained in needles I want to find their indexes in haystack. In this case 1,3.

I came up with this solution:

 indexes = [idx for idx,elem in enumerate(haystack) if elem in needles ]

Which is wrong because actually is sufficient that one element in elem is in needles to return the idx.

Is there any faster alternative?

G M
  • 20,759
  • 10
  • 81
  • 84
  • `indexes = [idx for idx,elem in enumerate(needles) if elem in haystack ]` This gets indexes in needles, not haystack! – h4z3 Jul 02 '19 at 11:33

2 Answers2

0

this response gives a solution to a similar problem Get intersecting rows across two 2D numpy arrays, you use the np.in1d function which is pretty efficient, but you do that by giving it a view of both arrays which allows is to process them as 1d data array. In your case, you could do

A = np.array([[1,3],[3,4,],[5,6],[7,8]])
B = np.array([[3,4],[7,8]])
nrows, ncols = A.shape
dtype={'names':['f{}'.format(i) for i in range(ncols)],
       'formats':ncols * [A.dtype]}
indexes, = np.where(np.in1d(A.view(dtype), B.view(dtype)))

which outputs :

print(indexes)
> array([1, 3])
Ayoub ZAROU
  • 2,387
  • 6
  • 20
0

You can try this

indices = np.apply_along_axis(lambda x: np.where(np.isin(haystack, x).sum(axis=1)==2)[0], 1, needles).flatten()
indices
>>> array([1, 3])
Greeser
  • 76
  • 3