I have a dataframe with MultiIndex columns (2 levels) and the labels of the second levels are not the same for the different groups of the first column. Consider this example:
df1 = pd.DataFrame(np.arange(12).reshape(4, 3),
columns=list('abc'))
df1.columns = pd.MultiIndex.from_product((["df1"], df1.columns))
df2 = pd.DataFrame(np.arange(100, 112).reshape(4, 3),
columns=list('def'))
df2.columns = pd.MultiIndex.from_product((["df2"], df2.columns))
df = pd.concat((df1, df2), axis=1)
It looks something like this
df1 df2
a b c d e f
0 0 1 2 100 101 102
1 3 4 5 103 104 105
2 6 7 8 106 107 108
3 9 10 11 109 110 111
Now I would like to find a simple way to retrieve one or more columns of the first and the second group. I.e. something like
df1 df2
a b d
0 0 1 100
1 3 4 103
2 6 7 106
3 9 10 109
However, whatever I try, doesn't work. For example,
df.loc[:, [('df1', 'a'), ('df2', ['d', 'e'])]]
throws TypeError: unhashable type: 'list'
. The same is true for
df.loc[:, [pd.IndexSlice['df1', 'a'], pd.IndexSlice['df2', ['d', 'e']]]]
Is there a way of selecting these columns without using pd.concat
?