Numpy finding element index in another array

Question

I have an array/set with unique positive integers, i.e.

>>> unique = np.unique(np.random.choice(100, 4, replace=False))

And an array containing multiple elements sampled from this previous array, such as

>>> A = np.random.choice(unique, 100)

I want to map the values of the array A to the position of which those values occur in unique.

So far the best solution I found is through a mapping array:

>>> table = np.zeros(unique.max()+1, unique.dtype)
>>> table[unique] = np.arange(unique.size)

The above assigns to each element the index on the array, and thus, can be used later to map A through advanced indexing:

>>> table[A]
array([2, 2, 3, 3, 3, 3, 1, 1, 1, 0, 2, 0, 1, 0, 2, 1, 0, 0, 2, 3, 0, 0, 0,
       0, 3, 3, 2, 1, 0, 0, 0, 2, 1, 0, 3, 0, 1, 3, 0, 1, 2, 3, 3, 3, 3, 1,
       3, 0, 1, 2, 0, 0, 2, 3, 1, 0, 3, 2, 3, 3, 3, 1, 1, 2, 0, 0, 2, 0, 2,
       3, 1, 1, 3, 3, 2, 1, 2, 0, 2, 1, 0, 1, 2, 0, 2, 0, 1, 3, 0, 2, 0, 1,
       3, 2, 2, 1, 3, 0, 3, 3], dtype=int32)

Which already gives me the proper solution. However, if the unique numbers in unique are very sparse and large, this approach implies creating a very large table array just to store a few numbers for later mapping.

Is there any better solution?

NOTE: both A and unique are sample arrays, not real arrays. So the question is not how to generate positional indexes, it is just how to efficiently map elements of A to indexes in unique, the pseudocode of what I'd like to speedup in numpy is as follows,

B = np.zeros_like(A)
for i in range(A.size):
    B[i] = unique.index(A[i])

(assuming unique is a list in the above pseudocode).

score 4 · Accepted Answer · edited May 23 '17 at 12:09

4

The table approach described in your question is the best option when unique if pretty dense, but unique.searchsorted(A) should produce the same result and doesn't require unique to be dense. searchsorted is great with ints, if anyone is trying to do this kind of thing with floats which have precision limitations, consider something like this.

edited May 23 '17 at 12:09

Community

1
1

answered May 26 '16 at 17:03

Bi Rico

25,283
3
52
75

And `sorter` could be used with it, if `unique` is not already sorted. – Divakar May 26 '16 at 17:31

score 2 · Answer 2 · answered May 26 '16 at 15:10

2

You can use standard python dict with np.vectorize

inds = {e:i for i, e in enumerate(unique)}
B = np.vectorize(inds.get)(A)

answered May 26 '16 at 15:10

hilberts_drinking_problem

11,322
3
22
51

Interesting approach, I will have to test the performance of `np.vectorize` for large matrices though. – Imanol Luengo May 26 '16 at 15:19
np.vectorize loops on the python level, so no need to perform that test... its just syntactic sugar – Eelco Hoogendoorn May 26 '16 at 18:32

score 2 · Answer 3 · answered May 26 '16 at 18:31

The numpy_indexed package (disclaimer: I am its author) contains a vectorized equivalent of list.index, which does not require memory proportional to the max element, but only proportional to the input itself:

import numpy_indexed as npi
npi.indices(unique, A)

Note that it also works for arbitrary dtypes and dimensions. Also, the array being queried does not need to be unique; the first index encountered will be returned, the same as for list.

Numpy finding element index in another array

3 Answers3