0

I have a dataframe with MultiIndex columns (2 levels) and the labels of the second levels are not the same for the different groups of the first column. Consider this example:

df1 = pd.DataFrame(np.arange(12).reshape(4, 3),
                   columns=list('abc'))
df1.columns = pd.MultiIndex.from_product((["df1"], df1.columns))
df2 = pd.DataFrame(np.arange(100, 112).reshape(4, 3),
                   columns=list('def'))
df2.columns = pd.MultiIndex.from_product((["df2"], df2.columns))
df = pd.concat((df1, df2), axis=1)

It looks something like this

    df1         df2
    a   b   c   d   e   f
0   0   1   2   100 101 102
1   3   4   5   103 104 105
2   6   7   8   106 107 108
3   9   10  11  109 110 111

Now I would like to find a simple way to retrieve one or more columns of the first and the second group. I.e. something like

    df1     df2
    a   b   d
0   0   1   100
1   3   4   103
2   6   7   106
3   9   10  109

However, whatever I try, doesn't work. For example,

df.loc[:, [('df1', 'a'), ('df2', ['d', 'e'])]]

throws TypeError: unhashable type: 'list'. The same is true for

df.loc[:, [pd.IndexSlice['df1', 'a'], pd.IndexSlice['df2', ['d', 'e']]]]

Is there a way of selecting these columns without using pd.concat?

zeawoas
  • 446
  • 7
  • 18

2 Answers2

1

Your question is similar to Python Pandas slice multiindex by second level index (or any other level)

Then df.loc[:, (slice(None), ('a', 'b', 'd'))] should give the expected result.

Guillaume Ansanay-Alex
  • 1,212
  • 2
  • 9
  • 20
  • This is a great answer. However, I just realised that I have worded the question in a wrong way. I would also like to be able to take all columns of the first group and a subset of the second. I will ask another one. – zeawoas Apr 09 '21 at 10:15
  • @zeawoas - su list of tuples is not possible? – jezrael Apr 09 '21 at 10:23
  • 1
    @jezrael it is possible, but my actual use case is a lot more complicated and I would like to query the dataframe programmatically. For this, this is a better solution. I still upvoted your answer. – zeawoas Apr 09 '21 at 10:28
1

You can pass a list of tuples as : df.loc[:,[('df1', 'a'),('df1', 'b'),('df2','d')]

DeGo
  • 783
  • 6
  • 14