How to find indices of a reordered numpy array?

Question

Say I have a sorted numpy array:

arr = np.array([0.0, 0.0],
               [0.5, 0.0],
               [1.0, 0.0],
               [0.0, 0.5],
               [0.5, 0.5],
               [1.0, 0.5],
               [0.0, 1.0],
               [0.5, 1.0],
               [1.0, 1.0])

and suppose I make a non trivial operation on it such that I have a new array which is the same as the old one but in another order:

arr2 = np.array([0.5, 0.0],
                [0.0, 0.0],
                [0.0, 0.5],
                [1.0, 0.0],
                [0.5, 0.5],
                [1.0, 0.5],
                [0.0, 1.0],
                [1.0, 1.0],
                [0.5, 1.0])

The question is: how do you get the indices of where each element of arr2 are placed in arr. In other terms, I want a method that takes both arrays and return an array the same length as arr2 but with the index of the element of arr. For example, the first element of the returned array would be the index of the first element of arr2 in arr.

where_things_are(arr2, arr) 
return : array([1, 0, 3, 2, 4, 5, 6, 8, 7])

Does a function like this already exists in numpy?

EDIT:

I tried:

np.array([np.where((arr == x).all(axis=1)) for x in arr2])

which returns what I want, but my question still holds: is there a more efficient way of doing this using numpy methods?

EDIT2:

It should also work if the length of arr2 is not the same as the length of the original array (like if I removed some elements from it). Thus it is not finding and inverting a permutation but rather finding where elements are located at.

the "inverse" won't be unique - much better to augment the original arr with an added indices axis, carry it thru the "non trivial operation" — f5r5e5d, Feb 14 '17 at 17:38
The non trivial operation I use will preserve the uniqueness yes, but keeping the original indices won't help since the operation does not preserve the order. — fgoudra, Feb 14 '17 at 17:43
apply the same reordering operation to the added indices axis too, afterwards the indices still label the original positions of the transformed elements of arr, easy to sort on the added indices axis to restore original order — f5r5e5d, Feb 14 '17 at 17:49
"It should also work if the length of arr2 is not the same as the length of the original array" - stop changing the question on us. — user2357112, Feb 14 '17 at 18:34

Paul Panzer · Accepted Answer · 2017-02-14T20:11:51.810

The key is inverting permutations. The code below works even if the original array is not sorted. If it is sorted then find_map_sorted can be used which obviously is faster.

UPDATE: Adapting to the OP's ever changing requirements, I've added a branch that handles lost elements.

import numpy as np

def invperm(p):
    q = np.empty_like(p)
    q[p] = np.arange(len(p))
    return q

def find_map(arr1, arr2):
    o1 = np.argsort(arr1)
    o2 = np.argsort(arr2)
    return o2[invperm(o1)]

def find_map_2d(arr1, arr2):
    o1 = np.lexsort(arr1.T)
    o2 = np.lexsort(arr2.T)
    return o2[invperm(o1)]

def find_map_sorted(arr1, arrs=None):
    if arrs is None:
        o1 = np.lexsort(arr1.T)
        return invperm(o1)
    # make unique-able
    rdtype = np.rec.fromrecords(arrs[:1, ::-1]).dtype
    recstack = np.r_[arrs[:,::-1], arr1[:,::-1]].view(rdtype).view(np.recarray)
    uniq, inverse = np.unique(recstack, return_inverse=True)
    return inverse[len(arrs):]

x1 = np.random.permutation(100000)
x2 = np.random.permutation(100000)
print(np.all(x2[find_map(x1, x2)] == x1))

rows = np.random.random((100000, 8))
r1 = rows[x1, :]
r2 = rows[x2, :]
print(np.all(r2[find_map_2d(r1, r2)] == r1))

rs = r1[np.lexsort(r1.T), :]
print(np.all(rs[find_map_sorted(r2), :] == r2))

# lose ten elements
print(np.all(rs[find_map_sorted(r2[:-10], rs), :] == r2[:-10]))

Mazdak · Answer 2 · 2017-02-14T18:23:36.833

Here is a way using numpy Broadcasting:

In [10]: ind = np.where(arr[:, None] == arr2[None, :])[1]

In [11]: ind[np.where(np.diff(ind)==0)]
Out[11]: array([1, 0, 3, 2, 4, 5, 6, 8, 7])

The idea behind this is, increasing the dimension of arrays so that their comparison produces a 3d array which since the original sub-array have length 2 if we had two consecutive equal items in second axis of the result of comparison they would be where both items are equal. For a better demonstration here is the result of comparison without selecting the second axis:

In [96]: np.where(arr[:, None] == arr2[None, :])
Out[96]: 
(array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3,
        3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7,
        7, 7, 8, 8, 8, 8, 8, 8]),
 array([0, 1, 1, 2, 3, 6, 0, 0, 1, 3, 4, 8, 0, 1, 3, 3, 5, 7, 1, 2, 2, 4, 5,
        6, 0, 2, 4, 4, 5, 8, 2, 3, 4, 5, 5, 7, 1, 2, 6, 6, 7, 8, 0, 4, 6, 7,
        8, 8, 3, 5, 6, 7, 7, 8]),
 array([1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1,
        0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1,
        0, 1, 0, 0, 1, 0, 1, 1]))

And then for finding those items we just need to find the places that their diff is 0.

Eelco Hoogendoorn · Answer 3 · 2017-02-14T20:20:16.813

2

The numpy_indexed package (disclaimer: i am its author) contains efficient functionality for exactly this type of problem; npi.indices is the ndarray-equivalent of list.index.

import numpy_indexed as npi
idx = npi.indices(arr, arr2)

This returns a list of indices such that arr[idx] == arr2. If arr2 contains elements not present in arr, a ValueError is raised; but you can control that with the 'missing' kwarg.

To answer your question if this functionality is included in numpy; yes, in the sense that numpy is a turing-complete ecosystem. But not really, if you count the number of lines of code required to implement this in an efficient, correct and general manner.

edited Feb 14 '17 at 20:20

answered Feb 14 '17 at 20:13

Eelco Hoogendoorn

10,459
1
44
42

Looks like an interesting extension. Would you mind - very briefly - describing the algorithm you are using? Thanks! – Paul Panzer Feb 14 '17 at 20:34
It is similar to the other arg-sorting based approaches described here, and should be similar in performance as well. The extra lines of code are mostly just to cover the edge cases and to make it more general (like working on ndarrays, taking indices over arbitrary axes, funny dtypes, and so on) – Eelco Hoogendoorn Feb 14 '17 at 20:57

score 0 · Answer 4 · answered Feb 14 '17 at 18:08

If you guarantee uniqueness:

[ np.where(np.logical_and((arr2==x)[:,1], (arr2==x)[:,0])==True)[0][0] for x in arr]

Notice that, I converted your array to 2D: e.g.

arr2 = np.array([[0.5, 0.0],
[0.0, 0.0],
[0.0, 0.5],
[1.0, 0.0],
[0.5, 0.5],
[1.0, 0.5],
[0.0, 1.0],
[1.0, 1.0],
[0.5, 1.0]])

How to find indices of a reordered numpy array?

4 Answers4

Linked