0

How do I select a particular column from a DataFrame when there are multiple levels of naming?

>>>  x = pd.DataFrame({'instance':['first','first','first'],'foo':['a','b','c'],'bar':rand(3)})
>>> x = x.set_index(['instance','foo']).transpose()
>>> x.columns
MultiIndex
[(u'first', u'a'), (u'first', u'b'), (u'first', u'c')]
>>> x
instance     first                    
foo              a         b         c
bar       0.102885  0.937838  0.907467

(Note: this question is asked in the comments to this SO question and there is an answer also in the comments. Thought it'd be good to have it as a question in its own right.)

Community
  • 1
  • 1
LondonRob
  • 73,083
  • 37
  • 144
  • 201

1 Answers1

1

This is exactly the purpose of Multiindex slicers, see docs here

In [15]: idx = pd.IndexSlice

In [16]: x.loc[:,idx[:,'a']]
Out[16]: 
instance     first
foo              a
bar       0.525356

In [17]: x.loc[:,idx[:,['a','c']]]
Out[17]: 
instance     first          
foo              a         c
bar       0.525356  0.418152
Jeff
  • 125,376
  • 21
  • 220
  • 187
  • 1
    Is there anyone in the world who can actually understand this? I've read the docs and they boggle my mind. – LondonRob Aug 20 '14 at 11:22
  • Also: is there a way to do this in `0.13.1`? – LondonRob Aug 20 '14 at 11:23
  • in older versions you can use ``.xs`` (for a single label but on on any level/axis), or ``x.loc[:,x.columns.get_level_values(1).isin(['a','c])]`` for multiple – Jeff Aug 20 '14 at 11:58