This may be the wrong general approach, but I'm trying to use a Pandas series as essentially a lookup table for some numpy arrays of strings / labels:
import pandas as pd
import numpy as np
data_map = pd.Series([2, 4, 6, 0, 1], index=list('abcde'))
lab1d = np.array(['a', 'd', 'd', 'c'])
lab2d = np.array([['d', 'e'],
['e', 'd'],
['c', 'a'],
['a', 'b']])
val1d = data_map.loc[lab1d]
val2d = data_map.loc[lab2d]
If I do this, val1d
resolves correctly to:
a 2
d 0
d 0
c 6
dtype: int64
But val2d = data_map.loc[lab2d]
raises a ValueError
:
ValueError: Cannot index with multidimensional key
I think I get the reason why this does not work, but what is the proper way to take a numpy.ndarray
of size (m x n) containing Index values and return (something which can be converted to) a numpy.ndarray
of size (m x n) containing the corresponding values?
Edit
I had considered storing the data instead in a dictionary and using numpy.vectorize
(as illustrated in this question), which is definitely my fallback, but I want to clarify that I'm interested in knowing if there's a way to do this using some pandas methods.
Edit 2 I should clarify that I'm actually looking for something that follows broadcasting rules, e.g.:
dmd = data_map.to_dict()
make_map = np.vectorize(dmd.__getitem__)
val1d = make_map(lab1d)
val2d = make_map(lab2d)
Which for val1d returns:
array([2, 0, 0, 6])
And for val2d returns:
array([[0, 1],
[1, 0],
[6, 2],
[2, 4]])