1

This may be the wrong general approach, but I'm trying to use a Pandas series as essentially a lookup table for some numpy arrays of strings / labels:

import pandas as pd
import numpy as np

data_map = pd.Series([2, 4, 6, 0, 1], index=list('abcde'))
lab1d = np.array(['a', 'd', 'd', 'c'])
lab2d = np.array([['d', 'e'],
                  ['e', 'd'],
                  ['c', 'a'],
                  ['a', 'b']])

val1d = data_map.loc[lab1d]
val2d = data_map.loc[lab2d]

If I do this, val1d resolves correctly to:

a    2
d    0
d    0
c    6
dtype: int64

But val2d = data_map.loc[lab2d] raises a ValueError:

ValueError: Cannot index with multidimensional key

I think I get the reason why this does not work, but what is the proper way to take a numpy.ndarray of size (m x n) containing Index values and return (something which can be converted to) a numpy.ndarray of size (m x n) containing the corresponding values?

Edit I had considered storing the data instead in a dictionary and using numpy.vectorize (as illustrated in this question), which is definitely my fallback, but I want to clarify that I'm interested in knowing if there's a way to do this using some pandas methods.

Edit 2 I should clarify that I'm actually looking for something that follows broadcasting rules, e.g.:

dmd = data_map.to_dict()
make_map = np.vectorize(dmd.__getitem__)

val1d = make_map(lab1d)
val2d = make_map(lab2d)

Which for val1d returns:

array([2, 0, 0, 6])

And for val2d returns:

array([[0, 1],
       [1, 0],
       [6, 2],
       [2, 4]])
Divakar
  • 218,885
  • 19
  • 262
  • 358
Paul
  • 10,381
  • 13
  • 48
  • 86
  • Note: If my "Edit 2" example is basically the "right" way to do this, then I think this question can be closed as a duplicate of [the linked question](https://stackoverflow.com/questions/16992713/translate-every-element-in-numpy-array-according-to-key). – Paul Jan 20 '16 at 20:02

2 Answers2

2

Here's a vectorized approach using np.searchsorted -

data_map[np.searchsorted(np.array(data_map.index),lab1d)]
data_map[np.searchsorted(np.array(data_map.index),lab2d)]

Sample run -

>>> data_map = pd.Series([2, 4, 6, 0, 1], index=list('abcde'))
>>> lab1d = np.array(['a', 'd', 'd', 'c'])
>>> lab2d = np.array([['d', 'e'],
...                   ['e', 'd'],
...                   ['c', 'a'],
...                   ['a', 'b']])
>>> data_map[np.searchsorted(np.array(data_map.index),lab1d)]
a    2
d    0
d    0
c    6
dtype: int64
>>> data_map[np.searchsorted(np.array(data_map.index),lab2d)]
array([[0, 1],
       [1, 0],
       [6, 2],
       [2, 4]])
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • This looks good. Comparing with %timeit, it's about 2-3x slower than the dictionary-based lookup, but this looks like basically the right way to do it using just numpy arrays and pandas Series. – Paul Jan 20 '16 at 20:14
  • @Paul Hmm interesting. I was really hoping it to be fast! Maybe it would be with a larger dataset and lookup! – Divakar Jan 20 '16 at 20:21
1

You can just flatten the array, then reshape :

data_map[lab2d.ravel()].reshape(lab2d.shape)
B. M.
  • 18,243
  • 2
  • 35
  • 54
  • Switching the accepted answer to this one, since this is actually about as fast as the `dict` based method, and is much more readable and straightforward. – Paul Mar 25 '16 at 20:40