Let's say we have a DataFrame
with multiple levels of column headers.
level_0 A B C
level_1 P P P
level_2 x y x y x y
0 -1.027155 0.667489 0.314387 -0.428607 1.277167 -1.328771
1 0.223407 -1.713410 0.480903 -3.517518 -1.412756 0.718804
I want to select a list of columns from a named level.
required_columns = ['A', 'B']
required_level = 'level_0'
Method 1: (deprecated in favor of df.loc)
print df.select(lambda x: x[0] in required_columns, axis=1)
The problem with this is that I have to specify the level with 0. It fails if I use the name of the level.
Method 2:
print df.xs('A', level=required_level, axis=1)
The problem with this is that I can only specify a single value. It fails if I use ['A', 'B'].
Method 3:
print df.ix[:, df.columns.get_level_values(required_level).isin(required_columns)]
This works, but isn't as concise as the previous two methods! :)
Question:
How can I get method 1 or 2 to work? Or, is there a more pythonic way?
The MWE:
import pandas as pd
import numpy as np
header = pd.MultiIndex.from_product([['A', 'B', 'C'],
['P'],
['x', 'y']],
names=['level_0',
'level_1',
'level_2'])
df = pd.DataFrame(
np.random.randn(2, 6),
columns=header
)
required_columns = ['A', 'B']
required_level = 'level_0'
print df
print df.select(lambda x: x[0] in required_columns, axis=1)
print df.xs('A', level=required_level, axis=1)
print df.ix[:, df.columns.get_level_values(required_level).isin(required_columns)]
Related questions: