7

I have a numpy array, whose elements are unique, for example:

b = np.array([5, 4, 6, 8, 1, 2])

(Edit2: b can have large numbers, and float numbers. The above example is there for simplicity)

I am getting numbers, that are elements in b.

I want to find their index in b, meaning I want a reverse mapping, from value to index, in b.

I could do

for number in input:
    ind = np.where(number==b)

which would iterate over the entire array every call to where.

I could also create a dictionary,

d = {}
for i, element in enumerate(list(b)):
    d[element] = i

I could create this dictionary at "preprocessing" time, but still I would be left with a strange looking dictionary, in a mostly numpy code, which seems (to me) not how numpy is meant to be used.

How can I do this reverse mapping in numpy?

usage (O(1) time and memory required):

print("index of 8 is: ", foo(b, 8))

  • Edit1: not a duplicate of this

Using in1d like explained here doesn't solve my problem. Using their example:

b = np.array([1, 2, 3, 10, 4])

I want to be able to find for example 10's index in b, at runtime, in O(1).

Doing a pre-processing move

mapping = np.in1d(b, b).nonzero()[0]

>> [0, 1, 2, 3, 4]

(which could be accomplished using np.arange(len(b)))

doesn't really help, because when 10 comes in as input, It is not possible to tell its index in O(1) time with this method.

Gulzar
  • 23,452
  • 27
  • 113
  • 201
  • Is it a 1D array? What about duplicates? Etc, etc – cs95 Jan 11 '19 at 20:03
  • 1
    perhaps a good candidate for [code golf](https://codegolf.stackexchange.com/)? – uhoh Jan 12 '19 at 03:55
  • 1
    @uhoh I second the golf suggestion, but first the OP needs to nail down the data structure he actually wants to store the lookup table in – tel Jan 12 '19 at 03:56
  • 1
    @uhoh Did you end up making a golf on this? If so, can you please link? – Gulzar May 19 '21 at 12:59
  • @Gulzar I have made [one golf](https://codegolf.stackexchange.com/q/212300/85527) so far but it's not related to this. Go for it! :-) – uhoh May 19 '21 at 13:02

4 Answers4

2

It's simpler than you think, by exploiting numpy's advanced indexing.

What we do is make our target array and just assign usign b as an index. We'll assign the indices we want by using arange.

>>> t = np.zeros((np.max(b) + 1,))
>>> t[b] = np.arange(0, b.size)
>>> t
array([0., 4., 5., 0., 1., 0., 2., 0., 3.])

You might use nans or -1 instead of zeros to construct the target to help detect invalid lookups.

Memory usage: this is optimally performant in both space and time as it's handled entirely by numpy.

If you can tolerate collisions, you can implement a poor man's hashtable. Suppose we have currencies, for example:

h = np.int32(b * 100.0) % 101  # Typically some prime number
t = np.zeros((101,))
t[h] = np.arange(0, h.size)

# Retrieving a value v; keep in mind v can be an ndarray itself.
t[np.int32(v * 100.0) % 101]

You can do any other steps to munge the address if you know what your dataset looks like.

This is about the limit of what's useful to do with numpy.

Ben
  • 836
  • 8
  • 18
  • what if b has an element whose value is `99999.4`? – Gulzar Jan 11 '19 at 20:22
  • @Gulzar See my comment on my answer. I believe that this is simply not practical to achieve with numpy. – Joe Iddon Jan 11 '19 at 20:23
  • I'm going by the question, which indicates integers and a compact array. If you need a general purpose mapping, use a dictionary, but that wasn't what the question asked. If you're trying to look up floats, you'll want to normalize them in order to handle rounding errors. – Ben Jan 11 '19 at 20:24
2

Solution

If you want constant time (ie O(1)), then you'll need to precompute a lookup table of some sort. If you want to make your lookup table using another Numpy array, it'll effectively have to be a sparse array, in which most values are "empty". Here's a workable approach in which empty values are marked as -1:

b = np.array([5, 4, 6, 8, 1, 2])

_b_ix = np.array([-1]*(b.max() + 1))
_b_ix[b] = np.arange(b.size)
# _b_ix: array([-1,  4,  5, -1,  1,  0,  2, -1,  3])

def foo(*val):
    return _b_ix[list(val)]

Test:

print("index of 8 is: %s" % foo(8))
print("index of 0,5,1,8 is: %s" % foo(0,5,1,8))

Output:

index of 8 is: [3]
index of 0,5,1,8 is: [-1  0  4  3]

Caveat

In production code, you should definitely use a dictionary to solve this problem, as other answerers have pointed out. Why? Well, for one thing, say that your array b contains float values, or any non-int value. Then a Numpy-based lookup table won't work at all.

Thus, you should use the above answer only if you have a deep-seated philosophical opposition to using a dictionary (eg a dict ran over your pet cat). Here's a nice way to generate a reverse lookup dict:

ix = {k:v for v,k in enumerate(b.flat)}
tel
  • 13,005
  • 2
  • 44
  • 62
  • In the case b contains integers, is there a difference in cpu time between the fully numpy solution and the dictionnary one? – Johann Bzh Feb 12 '19 at 14:40
1

You can use dict, zip and numpy.arrange to create your reverse lookup:

import numpy 

b = np.array([5, 4, 6, 8, 1, 2])
d = dict(zip(b, np.arange(0,len(b))))
print(d)

gives:

{5: 0, 4: 1, 6: 2, 8: 3, 1: 4, 2: 5}
Alex
  • 21,273
  • 10
  • 61
  • 73
0

If you want to do multiple lookups, you can do these in O(1) after an initial O(n) traversal to create a lookup dictionary.

b = np.array([5, 4, 6, 8, 1, 2])
lookup_dict = {e:i for i,e in enumerate(b)}
def foo(element):
    return lookup_dict[element]

And this works for your test:

>>> print('index of 8 is:', foo(8))
index of 8 is:  3

Note that if there is a possibility that b may have changed since the last foo() call, we must re-create the dictionary.

Joe Iddon
  • 20,101
  • 7
  • 33
  • 54
  • the array never changes. As for the answer, didn't you just did exactly what I wrote in my question? – Gulzar Jan 11 '19 at 20:17
  • 1
    @Gulzar My bad, I did not fully read the question! On further consideration, I stand by this as the best way to create a reverse mapping. This can not be achieved with numpy as the module does not have support for "lookup-like" structures; only arrays. Therefore the only way to achieve this in numpy would be to create a massive array where the index of each element is the value of each element in `b` and the value of each element is the index of the index of that value in `b`. This would use a lot of memory (as elements would be `0`) and also doesn't account for negatives or floats. – Joe Iddon Jan 11 '19 at 20:22