I would like to build a function that makes working with dynamic multi-index dataframe filtering more user friendly.
For example, the function takes a dictionary of index labels and filter values, and tuple of metrics.
In order for this to work, the function needs to make no assumptions about the existence or ordering of index labels. The closest thing I've found is the df.xs()
.
sample code:
df = pd.DataFrame({'lab1': np.random.choice(['A','B','C'],100,replace=True), 'lab2': np.random.choice(['one','two','three','four'],100,replace=True), 'val': np.random.rand(100)})
df = df.groupby(['lab1','lab2']).sum()
val
lab1 lab2
A four 3.296221
one 5.057798
three 3.443166
two 3.913044
B four 3.815448
one 3.892152
three 2.995777
two 9.715343
C four 6.118737
one 3.735783
three 2.461903
two 5.252095
here's a static example using .xs():
df.xs(('A', slice('one','three')), level=['lab1','lab2'])
val
lab1 lab2
A one 5.057798
three 3.443166
The issue seems to be that you can't easily pass a list argument into slice()
. I've tried using pd.IndexSlice, map, lambda, etc but can't get it to work.
What id like to get is this:
filters = {
'lab1': 'A',
'lab2' : ('one','three'),
metrics = ('val')
}
def metric_ts(filters, metrics):
levels = list(filters.keys()) + ['metric_name']
keys = map(slice, list(filters.values()))
return df_norm.xs(keys, levels)
Note: I understand that there are several ways to do this using .loc[], etc. I am looking for a very general solution that does not rely on positional syntax. Thanks!