2

Given two lists:

In [518]: A
Out[518]: [3, 4, 2, 1, 7, 6, 5]

In [519]: B
Out[519]: [4, 6]

Every element in B exists in A, without exception.

I'd like to retrieve an array of indexes for B as seen in A. For example, 4 is present in index 1 in A, and 6 is in position 5 for B. My expected output is [1, 5] for this scenario.

This is what I did to get the index:

In [520]: np.flatnonzero(np.in1d(a, b))
Out[520]: array([1, 5])

Unfortunately, this won't work in most other cases. For example, if B = [6, 4], my method still outputs [1, 5] when it should output [5, 1].

Is there an efficient numpy way to get what I'm trying to achieve?

cs95
  • 379,657
  • 97
  • 704
  • 746
  • 1
    Fairly sure this has come up before? What about duplicates? – Jon Clements Oct 21 '17 at 10:29
  • The help for `np.where` gives this exact example: `ix = np.in1d(A.ravel(), B).reshape(A.shape); np.where(ix)`. Sorry, doesn't match your second criterion. – p-robot Oct 21 '17 at 10:30
  • @JonClements Might've, couldn't find anything... As for dupes in B, surely any numpy method worth its salt should be able to handle those appropriately, but it wouldn't hurt me either way. – cs95 Oct 21 '17 at 10:31
  • 1
    @p-robot yes, and besides, my `flatnonzero` method is a little nicer. ;-) – cs95 Oct 21 '17 at 10:32
  • @cᴏʟᴅsᴘᴇᴇᴅ Do you absolutely have to use numpy? If you use default dict and have a list of indices for each value in A, then use that to find out indices for elements in B ... – Asterisk Oct 21 '17 at 10:36
  • 1
    @Asterisk Yes, this question was inspired by https://stackoverflow.com/questions/46862148/how-to-find-the-index-of-the-element-in-a-list-that-first-appears-in-another-giv and a slight modification of Martijn Pieters' answer will give me what I want in python. My question is more out of curiosity for the "how". – cs95 Oct 21 '17 at 10:37
  • 1
    @cᴏʟᴅsᴘᴇᴇᴅ ah ha! I knew I'd seen those A and Bs before :p – Jon Clements Oct 21 '17 at 10:42
  • So, this is different from your previous one - https://stackoverflow.com/questions/45830838/? – Divakar Oct 21 '17 at 11:18
  • @Divakar Wow... I definitely don't remember asking that... (100 questions later... :-) ) I think it _might_ be a dupe, but I can't remember the context, and MaxU's solution is sufficiently different from anything else to leave it open. I leave the decision to you. – cs95 Oct 21 '17 at 11:22
  • So, if there are duplicates in `A`, it should get all those indices and not just the first one? – Divakar Oct 21 '17 at 11:31
  • @Divakar I think it'd be simpler (less headache) to get just the first, and that's fine by me. – cs95 Oct 21 '17 at 11:33

3 Answers3

2

IIUC:

In [71]: a
Out[71]: array([3, 4, 2, 1, 7, 6, 5, 6, 4])

In [72]: b
Out[72]: array([4, 6])

In [73]: np.where(a==b[:,None])[1]
Out[73]: array([1, 8, 5, 7], dtype=int64)

In [74]: b = np.array([6, 4])

In [75]: np.where(a==b[:,None])[1]
Out[75]: array([5, 7, 1, 8], dtype=int64)

UPDATE: if you need only indices of first occurances (in case there are duplicates in A array), then use this solution from @Divakar, which will be faster:

In [84]: (a==b[:,None]).argmax(1)
Out[84]: array([5, 1], dtype=int64)
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • Thanks for your response! I'm looking for `[1, 5]` in the first instance and `[5, 1]` in the second. Your answer seems to be getting there but not quite there :-) – cs95 Oct 21 '17 at 11:16
  • @cᴏʟᴅsᴘᴇᴇᴅ, i've changed your `a` array ;-) – MaxU - stand with Ukraine Oct 21 '17 at 11:17
  • Oh my, I didn't realise. Yes, that's exactly what I'm looking for! Thanks so much! – cs95 Oct 21 '17 at 11:19
  • 1
    I guess you would need `(a==b[:,None]).argmax(1)` if [just the first instance is needed](https://stackoverflow.com/questions/46862256/determine-the-position-of-each-element-of-an-array-b-in-another-array-a#comment80673034_46862256). – Divakar Oct 21 '17 at 11:34
  • @Divakar, yeah, with [that additional condition](https://stackoverflow.com/questions/46862256/determine-the-position-of-each-element-of-an-array-b-in-another-array-a#comment80673034_46862256) i totally agree ;-) – MaxU - stand with Ukraine Oct 21 '17 at 11:47
  • I proposed that because `argmax` would be faster than `where`. So, that requirement would be beneficial. – Divakar Oct 21 '17 at 11:49
0

I don't know if it is efficient but

[int(np.isin(A, B[x]).nonzero()[0]) for x in range(len(B))]

seems to fit the bill. If uniqueness is not guaranteed then the int() part can be removed

percusse
  • 3,006
  • 1
  • 14
  • 28
  • Truthfully, I thought of this myself, but I wanted something a little less... loopy. – cs95 Oct 21 '17 at 11:10
0

If m=A.size and n=B.size the where approach is O(mn) . You can stay in O((m+n)log(m+n)) by carefully sort in1d output (with unique values here):

A= np.unique(np.random.randint(0,100000,100000))
np.random.shuffle(A)
B=np.unique(np.random.randint(0,10000,10000))
np.random.shuffle(B)

def find(A,B):
    pos=np.in1d(A,B).nonzero()[0]
    return pos[A[pos].argsort()][B.argsort().argsort()]

In [5]: np.allclose(np.where(np.equal.outer(B,A))[1],find(A,B))
Out[5]: True

In [6]: %time np.where(np.equal.outer(B,A))[1]
Wall time: 3.98 s
Out[6]: array([88220, 13472, 12482, ...,  9795, 39524,  5727], dtype=int64)

In [7]: %timeit find(A,B)
22.6 ms ± 366 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
B. M.
  • 18,243
  • 2
  • 35
  • 54