5

I have a dataframe with column multiindex that I need to slice and perform math operations between the slices.

# sample df
idx=pd.IndexSlice
np.random.seed(123)
tuples = list(zip(*[['one', 'one', 'two', 'two', 'three', 'three'],['foo', 'bar', 'foo', 'bar', 'foo', 'bar']]))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(3, 6), index=['A', 'B', 'C'], columns=index)

If I wanted to perform say addition/subtraction between individual columns, I could use index slice and do it like this:

df.loc[:,idx['three','foo']] - df.loc[:,idx['two','foo']]

However, if I want to use higher level slice it doesn't work and return NaNs:

# not working
df.loc[:,idx['three',:]] - df.loc[:,idx['two',:]]

Is there an easy way to use higher level slices of the df and add/subtract corresponding columns only? My dataframe potentially contains hundreds of columns in multiindex. Thanks

whada
  • 306
  • 3
  • 15

3 Answers3

6

If need MultiIndex in output use rename for same level od MultiIndex:

df = df.loc[:,idx['three',:]] - df.loc[:,idx['two',:]].rename(columns={'two':'three'})
print (df)
first      three          
second       foo       bar
A      -0.861579  3.157731
B      -1.944822  0.772031
C       2.649912  2.621137

Advantage is possible rename both levels to new index names and join to original:

df = (df.join(df.loc[:,idx['three',:]].rename(columns={'three':'four'}) - 
              df.loc[:,idx['two',:]].rename(columns={'two':'four'})))
print (df)
first        one                 two               three                four  \
second       foo       bar       foo       bar       foo       bar       foo   
A      -1.085631  0.997345  0.282978 -1.506295 -0.578600  1.651437 -0.861579   
B      -2.426679 -0.428913  1.265936 -0.866740 -0.678886 -0.094709 -1.944822   
C       1.491390 -0.638902 -0.443982 -0.434351  2.205930  2.186786  2.649912   

first             
second       bar  
A       3.157731  
B       0.772031  
C       2.621137  

If not necessary, use DataFrame.xs:

df1 = df.xs('three', axis=1, level=0) - df.xs('two', axis=1, level=0)
print (df1)
second       foo       bar
A      -0.861579  3.157731
B      -1.944822  0.772031
C       2.649912  2.621137

If need first level one possible solution is MultiIndex.from_product:

df1 = df.xs('three', axis=1, level=0) - df.xs('two', axis=1, level=0)
df1.columns = pd.MultiIndex.from_product([['new'], df1.columns], 
                                         names=['first','second'])
print (df1)
first        new          
second       foo       bar
A      -0.861579  3.157731
B      -1.944822  0.772031
C       2.649912  2.621137
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks, the rename options works like a charm. Just one more thing - if I wanted to rename the lower level in an output as well, what's the best way to accomplish it? say something like four/ bar_delta, foo_delta ? – whada Mar 06 '19 at 10:45
  • @whada - use `df = df.rename(columns={'foo':'bar'}, level=1)`, but is necessary rename all values, because same problem, get NaNs. – jezrael Mar 06 '19 at 10:46
1

You could try DataFrame.xs (cross-section) :

df.xs(('three'), axis=1) - df.xs(('two'), axis=1)
Chris Adams
  • 18,389
  • 4
  • 22
  • 39
1

In this particular case, the easiest way is probably:

df.loc[:, 'two'] - df.loc[:, 'three']

Nearly everything you need to know about MultiIndex can be in @coldspeed 's canonicals

JoergVanAken
  • 1,286
  • 9
  • 10