Getting the indices of several rows in a NumPy array at once

Question

I have 2 Arrays, e.g. like:

A: [[1 2 3][2 2 2][1 2 3][2 3 3][2 2 2][2 3 3][2 3 3]]  
B: [[1 2 3][2 2 2][2 3 3]]

B are the sorted unique rows of A.
I need:

C: [0 1 0 2 1 2 2]

Which is the list of indices of B in the order of A. I would like to avoid loops because it needs to be fast even with very big arrays.

The only solutions to this i found were only for 1D Arrays (e.g. Getting the indices of several elements in a NumPy array at once ).
I think this can be solved using np.void in a similar way to this: Find unique rows in numpy.array but i cannot get my head around it :/

I need to use NumPy 1.10 with no other libraries available.

unutbu · Accepted Answer · 2018-02-17T17:46:53.317

3

Given A and B, you can generate C using

In [25]: (B[:,None,:] == A).all(axis=-1).argmax(axis=0)
Out[25]: array([0, 1, 0, 2, 1, 2, 2])

Note that this assumes that every row of B is in A. (Otherwise, argmax could return bogus indices where the equality is False.)

Note that if you had NumPy version 1.13 or newer, then you could use np.unique to generate both B and C at the same time:

In [33]: np.unique(A, axis=0, return_inverse=True)
Out[33]: 
(array([[1, 2, 3],
        [2, 2, 2],
        [2, 3, 3]]), array([0, 1, 0, 2, 1, 2, 2]))

Note that Divakar's solution (using np.void) is far faster, particularly if A has many rows:

A = np.random.randint(10, size=(1000, 3))
B, C = np.unique(A, axis=0, return_inverse=True)

In [44]: %%timeit
   ....: A1D, B1D = view1D(A, B)
   ....: sidx = B1D.argsort()
   ....: out = argsort_unique(sidx)[np.searchsorted(B1D, A1D, sorter=sidx)]
   ....: 
1000 loops, best of 3: 271 µs per loop

In [45]: %timeit (B[:,None,:] == A).all(axis=-1).argmax(axis=0)
100 loops, best of 3: 15.5 ms per loop

edited Feb 17 '18 at 17:46

answered Feb 17 '18 at 17:15

unutbu

842,883
184
1,785
1,677

OP says they are on version 1.10, which AFAIK has no axis argument in unique. – Paul Panzer Feb 17 '18 at 17:16
right, no axis argument before 1.13 sadly. I tried to copy this function from NumpPy 1.14 source into my script but was not able to get it running and was too noobish to find out why. – Koyagi Feb 17 '18 at 17:17
@PaulPanzer: Thanks for the heads-up. – unutbu Feb 17 '18 at 17:17
(B[:,None,:] == A).all(axis=-1).argmax(axis=0) works very good, thx. :) – Koyagi Feb 17 '18 at 17:37

score 3 · Answer 2 · answered Feb 17 '18 at 17:16

Using void dtypes -

# https://stackoverflow.com/a/45313353/ @Divakar
def view1D(a, b): # a, b are arrays
    a = np.ascontiguousarray(a)
    b = np.ascontiguousarray(b)
    void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
    return a.view(void_dt).ravel(),  b.view(void_dt).ravel()

# https://stackoverflow.com/a/41242285/ @Andras Deak
def argsort_unique(idx):
    n = idx.size
    sidx = np.empty(n,dtype=int)
    sidx[idx] = np.arange(n)
    return sidx

A1D, B1D = view1D(A, B)
sidx = B1D.argsort()
out = argsort_unique(sidx)[np.searchsorted(B1D, A1D, sorter=sidx)]

Sample run -

In [36]: # Let's take OP sample and shuffle them 
         # to make for a more generic sample case
    ...: A = np.array([[1 ,2, 3],[2, 2, 2],[1, 2, 3],[2, 3, 3],[2 ,2, 2],[2, 3, 3],[2 ,3 ,3]])
    ...: B = np.array([[1, 2, 3],[2, 2 ,2],[2 ,3, 3]])
    ...: 
    ...: np.random.seed(0)
    ...: np.random.shuffle(B)
    ...: indx = np.array([0,1,0,2,1,2,2]) # we need to  retrieve these
                            # as the desired o/p
    ...: A = B[indx]

In [37]: A
Out[37]: 
array([[2, 3, 3],
       [2, 2, 2],
       [2, 3, 3],
       [1, 2, 3],
       [2, 2, 2],
       [1, 2, 3],
       [1, 2, 3]])

In [38]: B
Out[38]: 
array([[2, 3, 3],
       [2, 2, 2],
       [1, 2, 3]])

In [39]: A1D, B1D = view1D(A, B)
    ...: sidx = B1D.argsort()
    ...: out = argsort_unique(sidx)[np.searchsorted(B1D, A1D, sorter=sidx)]

In [40]: out
Out[40]: array([0, 1, 0, 2, 1, 2, 2])

Note that `np.array([-0.]).view(np.void) != np.array([0.]).view(np.void)`, so extra care has to be taken if `np.issubdtype(arr.dtype, np.floating)`. In that case `arr += 0.` fixes the problem. It's not a problem here, but perhaps good to know in general. — unutbu, Feb 17 '18 at 17:21
This works very good, thank you :) I have chosen the other answer as correct, because it is shorter. — Koyagi, Feb 17 '18 at 17:38
@kukuschi note that this answer should scale better. Of course, if you only have a handful of uniques it doesn't matter. — Paul Panzer, Feb 17 '18 at 17:47

Getting the indices of several rows in a NumPy array at once

2 Answers2