Using Numpy arrays as lookup tables

Question

I have a 2D array of Numpy data read from a .csv file. Each row represents a data point with the final column containing a a 'key' which corresponds uniquely to 'key' in another Numpy array - the 'lookup table' as it were.

What is the best (most Numpythonic) way to match up the lines in the first table with the values in the second?

The answer with a lookup `dict` is cool, but it's very ineffective for large lookup tables. If you want to "lookup" values, you can use `np.interp` with `xp` as your lookup keys and `fp` as your values (which have to be floats I guess...). This way the lookup is done in native `numpy` instead of python iteration (say you want to put a large image through a lookup table, e.g. applying a color map). — Tomasz Gandor, Dec 06 '17 at 18:30

score 10 · Accepted Answer · answered Aug 19 '10 at 15:22

10

Some example data:

import numpy as np

lookup = np.array([[  1.     ,   3.14   ,   4.14   ],
                   [  2.     ,   2.71818,   3.7    ],
                   [  3.     ,  42.     ,  43.     ]])

a = np.array([[ 1, 11],
              [ 1, 12],
              [ 2, 21],
              [ 3, 31]])

Build a dictionary from key to row number in the lookup table:

mapping = dict(zip(lookup[:,0], range(len(lookup))))

Then you can use the dictionary to match up lines. For instance, if you just want to join the tables:

>>> np.hstack((a, np.array([lookup[mapping[key],1:] 
                            for key in a[:,0]])))
array([[  1.     ,  11.     ,   3.14   ,   4.14   ],
       [  1.     ,  12.     ,   3.14   ,   4.14   ],
       [  2.     ,  21.     ,   2.71818,   3.7    ],
       [  3.     ,  31.     ,  42.     ,  43.     ]])

answered Aug 19 '10 at 15:22

Vebjorn Ljosa

17,438
13
70
88

+1 for getting +1 from Alex Martelli ;) And for having a useful answer, of course. – Wayne Werner Aug 19 '10 at 15:49
2

For whatever it's worth, there is a built-in numpy function to do this: `numpy.lib.recfunctions.join_by`. http://projects.scipy.org/numpy/browser/trunk/numpy/lib/recfunctions.py#L826 It's rather clunky if you're not already using structured arrays, though. – Joe Kington Aug 19 '10 at 15:53
Can someone explain to me what this portion does exactly? `np.array([lookup[mapping[key],1:] for key in a[:,0]])` – Carl Sep 29 '14 at 05:39
1

@Carl, it takes each key from the first column of `a` and looks up the matching row in the `lookup` array. Then it makes an array of those rows, leaving out the first column (the key). – Vebjorn Ljosa Oct 01 '14 at 01:25

score 5 · Answer 2 · answered Aug 22 '10 at 15:10

In the special case when the index can be calculated from the keys, the dictionary can be avoided. It's an advantage when the key of the lookup table can be chosen.

For Vebjorn Ljosa's example:

lookup:

>>> lookup[a[:,0]-1, :]
array([[  1.     ,   3.14   ,   4.14   ],
       [  1.     ,   3.14   ,   4.14   ],
       [  2.     ,   2.71818,   3.7    ],
       [  3.     ,  42.     ,  43.     ]])

merge:

>>> np.hstack([a, lookup[a[:,0]-1, :]])
array([[  1.     ,  11.     ,   1.     ,   3.14   ,   4.14   ],
       [  1.     ,  12.     ,   1.     ,   3.14   ,   4.14   ],
       [  2.     ,  21.     ,   2.     ,   2.71818,   3.7    ],
       [  3.     ,  31.     ,   3.     ,  42.     ,  43.     ]])

Using Numpy arrays as lookup tables

2 Answers2

Linked