Lookup table using 2 columns to identify rows

Question

I have a numpy lookup table of xy coordinates, where columns 0=xa, 1=ya, 2=xb, 3=yb. I'm trying to use xa and ya (cols 0 & 1) to act as the pair of elements that enable the looking up of xb and yb (cols 2 & 3), which are the actual xy coordinates I want to use.

lookup=
[[0,    0,  0,      0]
[2,     0,  1.98,   -0.01]
[4,     0,  3.99,   -0.01]
[6,     0,  6.03,   -0.01]
[8,     0,  8.02,   -0.03]
[10,    0,  9.98,   -0.01]
[12,    0,  11.99,  0]
[14,    0,  13.99,  0]
[0,     1,  -0.03,  0.88]
[2,     1,  1.95,   0.86]
[4,     1,  3.97,   0.85]
[6,     1,  5.97,   0.87]
[8,     1,  7.96,   0.86]
[10,    1,  9.95,   0.92]
[12,    1,  11.95,  0.92]
[14,    1,  13.97,  0.87]]

I have a table with data that has x and y locations in the format xa ya, that I wish to to change to xb yb using the lookup table:

gridloc=
[[6,    0]
 [8,    0]
 [8,    0]
 [10,   0]
 [8,    1]
 [10,   1]
 [12,   1]
 [14,   1]

So I want the result to be this:

newloc=
[[6.03,   -0.01]
 [8.02,   -0.03]
 [8.02,   -0.03]
 [9.98,   -0.01]
 [7.96,   0.86]
 [9.95,   0.92]
 [11.95,  0.92]
 [13.97,  0.87]]

I've tried using this to try to create a dictionary, but I get an error:

mapping = dict(zip(lookup[:,0:2], range(len(lookup))))

Traceback (most recent call last):

  File "<ipython-input-12-528fb6616ce0>", line 1, in <module>
    mapping = dict(zip(lookup[:,0:2], range(len(lookup))))

TypeError: unhashable type: 'numpy.ndarray'

Does anyone have any advice, please? Should my tables be in numpy in the first place? Is dict the way to solve the problem?

score 3 · Answer 1 · answered Apr 04 '18 at 10:04

3

Here is one Numpythonic approach:

In [89]: mask = np.logical_and(gridloc[:,0] == lookup[:,None,0], gridloc[:,1] == lookup[:,None, 1])

In [90]: ind = np.where(mask)[0]

In [91]: lookup[ind, 2:]
Out[91]: 
array([[ 6.030e+00, -1.000e-02],
       [ 8.020e+00, -3.000e-02],
       [ 8.020e+00, -3.000e-02],
       [ 9.980e+00, -1.000e-02],
       [ 7.960e+00,  8.600e-01],
       [ 9.950e+00,  9.200e-01],
       [ 1.195e+01,  9.200e-01],
       [ 1.397e+01,  8.700e-01]])

answered Apr 04 '18 at 10:04

Mazdak

105,000
18
159
188

This is good, although it's worth noting that it takes quadratic space (or, more specifically, `O(len(gridloc) * len(lookup))` space). – jdehesa Apr 04 '18 at 10:19
@jdehesa What do you mean by woth nothing exactly? worth for what? can you please elaborate? I assume that you mean in terms of memory usage, if it's so that's what you have to give up in exchange of gaining performance in runtime, in most cases though. And since the momory usage is not a critical issue here I didn't propose a generator-based approach which is terribly slower than such a vectorized approach. – Mazdak Apr 04 '18 at 10:41
What I mean by "worth noting" is that it is a factor to consider. If `gridloc` and `lookup` have 100k rows each it may not be a feasible option. I'm not saying that makes it a bad answer, it's probably the best option in most similar cases (that's why I upvoted it), but if one has big-ish arrays it may be necessary to resort to something different, even if it's less efficient. – jdehesa Apr 04 '18 at 10:41
@jdehesa You're right and thats actually an obvious fact. You can assume many other possibilities as well that may seem trivial at first but one can easily prof that they can cause really huge damages if one doesn't take care of them. One of which is the size of items than by default is `float64`. Nevertheless, my point here is to not use definite verbs in such situations because of all the reasons that I described. – Mazdak Apr 04 '18 at 10:46
I couldn't get this to work on my larger, real life examples, so I couldn't give the answer vote to this, sorry – georussell Apr 04 '18 at 10:51

score 2 · Accepted Answer · answered Apr 04 '18 at 10:31

One option is to do it using Pandas indexing capabilities:

import numpy as np
import pandas as pd

lookup = np.array(
    [[0,    0,  0,      0],
     [2,     0,  1.98,   -0.01],
     [4,     0,  3.99,   -0.01],
     [6,     0,  6.03,   -0.01],
     [8,     0,  8.02,   -0.03],
     [10,    0,  9.98,   -0.01],
     [12,    0,  11.99,  0],
     [14,    0,  13.99,  0],
     [0,     1,  -0.03,  0.88],
     [2,     1,  1.95,   0.86],
     [4,     1,  3.97,   0.85],
     [6,     1,  5.97,   0.87],
     [8,     1,  7.96,   0.86],
     [10,    1,  9.95,   0.92],
     [12,    1,  11.95,  0.92],
     [14,    1,  13.97,  0.87]])
gridloc = np.array(
    [[6,    0],
     [8,    0],
     [8,    0],
     [10,   0],
     [8,    1],
     [10,   1],
     [12,   1],
     [14,   1]])

idx = pd.MultiIndex.from_arrays([lookup[:, 0], lookup[:, 1]], names=('xa', 'ya'))
df = pd.DataFrame(lookup[:, 2:], columns=('xb', 'yb'), index=idx)
# This should work but is not implemented for multidimensional arrays
# newloc = df.loc[gridloc].values
# Converting to list of tuples works
newloc = df.loc[list(map(tuple, gridloc))].values  # Add .copy() if need writing
print(newloc)

Output:

[[  6.03000000e+00  -1.00000000e-02]
 [  8.02000000e+00  -3.00000000e-02]
 [  8.02000000e+00  -3.00000000e-02]
 [  9.98000000e+00  -1.00000000e-02]
 [  7.96000000e+00   8.60000000e-01]
 [  9.95000000e+00   9.20000000e-01]
 [  1.19500000e+01   9.20000000e-01]
 [  1.39700000e+01   8.70000000e-01]]

zipa · Answer 3 · 2018-04-04T10:11:52.030

1

First of all, lists are mutable and cannot be used as a dict key. That's why you need to convert your data to tuples:

mapping = dict(zip(map(tuple, lookup[:, :2]), map(tuple, lookup[:, 2:])))#
mapping
#{(0.0, 0.0): (0.0, 0.0),
# (0.0, 1.0): (-0.029999999999999999, 0.88),
# (2.0, 0.0): (1.98, -0.01),
# (2.0, 1.0): (1.95, 0.85999999999999999),
# (4.0, 0.0): (3.9900000000000002, -0.01),
# (4.0, 1.0): (3.9700000000000002, 0.84999999999999998),
# (6.0, 0.0): (6.0300000000000002, -0.01),
# (6.0, 1.0): (5.9699999999999998, 0.87),
# (8.0, 0.0): (8.0199999999999996, -0.029999999999999999),
# (8.0, 1.0): (7.96, 0.85999999999999999),
# (10.0, 0.0): (9.9800000000000004, -0.01),
# (10.0, 1.0): (9.9499999999999993, 0.92000000000000004),
# (12.0, 0.0): (11.99, 0.0),
# (12.0, 1.0): (11.949999999999999, 0.92000000000000004),
# (14.0, 0.0): (13.99, 0.0),
# (14.0, 1.0): (13.970000000000001, 0.87)}

Now to achieve your goal, you'll need to convert gridloc to list of tuples and then map mapping to it:

gridloc = list(map(mapping.get, map(tuple, gridloc)))
gridloc
#[(6.0300000000000002, -0.01),
# (8.0199999999999996, -0.029999999999999999),
# (8.0199999999999996, -0.029999999999999999),
# (9.9800000000000004, -0.01),
# (7.96, 0.85999999999999999),
# (9.9499999999999993, 0.92000000000000004),
# (11.949999999999999, 0.92000000000000004),
# (13.970000000000001, 0.87)]

P.S. Floating point math is not broken.

edited Apr 04 '18 at 10:11

answered Apr 04 '18 at 09:56

zipa

27,316
6
40
58

Do you need to map values to `tuples` too? – jpp Apr 04 '18 at 09:59
@jpp On the other hand - you look for tuples and you get back tuples - looks prettier to me :) – zipa Apr 04 '18 at 10:01
I couldn't manage to get this to give an output, only a map object. Than's for the pointer on floating point numbers, I really need to take a course on Python rather than carry on stumbling around in the dark like I have been – georussell Apr 04 '18 at 10:54
1

@georussell Did you try it like in edit with `gridloc = list(map(mapping.get, map(tuple, gridloc)))`? – zipa Apr 04 '18 at 10:55
Missed that, but that does work, thanks. Going with the Pandas answer just because it is more useful to me to have an array going forward – georussell Apr 04 '18 at 11:02
@georussell That's a great solution, too bad I didn't go with `pandas` myself in the first place :) – zipa Apr 04 '18 at 11:04

Lookup table using 2 columns to identify rows

3 Answers3