1

I would like to add columns into a Pandas multiindex dataframe, which will contain the result of an operation performed on other columns.

I have a dataframe similar to this one:

first   bar     baz     
second  one two one two
A       5   2   9   2   
B       6   4   7   6   
C       5   4   5   1   

Now, for each group in the dataframe, I'd like to add a column "three" which equals column "one" minus column "two":

first   bar             baz     
second  one two three   one two three
A       5   2   3       9   2   7
B       6   4   2       7   6   1
C       5   4   1       5   1   4

In reality my dataframe is much larger. I'm struggling to find the answer to this (hopefully) easy question. Any suggestions are appreciated.

  • 1
    Having a look to this topic should help you : https://stackoverflow.com/questions/12555323/adding-new-column-to-existing-dataframe-in-python-pandas – Liris Oct 29 '18 at 14:12
  • With a `multiIndex` on the columns you can still access the columns by providing a tuple: `df[('bar', 'one')]`, which may be the simplest method in certain situations. – ALollz Oct 29 '18 at 14:23

2 Answers2

1

Create your append df by using MultiIndex

s=pd.DataFrame([[1,2],[2,3],[3,4]],columns=pd.MultiIndex.from_arrays([['bar','baz'],['three','three']]))
s
Out[458]: 
    bar   baz
  three three
0     1     2
1     2     3
2     3     4

Then we do concat

yourdf=pd.concat([df,s],axis=1).sort_index(level=0,axis=1)

If the order is matter , you can reindex or may consider factorized the level .

BENY
  • 317,841
  • 20
  • 164
  • 234
1

Use DataFrame.xs for select one and two levels and subtract, then create MultiIndex in column by MultiIndex.from_product:

df1 = df.xs('one', axis=1, level=1) - df.xs('two', axis=1, level=1)
df1.columns = pd.MultiIndex.from_product([df1.columns, ['three']])
print (df1)
    bar   baz
  three three
A     3     7
B     2     1
C     1     4

Then concat to original and for change ordering use reindex by helper MultiIndex:

mux = pd.MultiIndex.from_product([['bar','baz'], ['one','two','three']], 
                                  names=df.columns.names)
df = pd.concat([df, df1], axis=1).reindex(columns=mux)
print (df)
first  bar           baz          
second one two three one two three
A        5   2     3   9   2     7
B        6   4     2   7   6     1
C        5   4     1   5   1     4
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • The first part works like a charm! Is there a way to extract the multiindex names ('bar' and 'baz' in this example?) – Thomas Balder Oct 29 '18 at 14:31
  • @ThomasBalder - yes, sure, use `print (df1.columns.get_level_values(0))` – jezrael Oct 29 '18 at 14:33
  • 1
    `mux = pd.MultiIndex.from_product([df.columns.get_level_values(0).unique(), ['one','two','three']], names=df.columns.names)` Works! Thanks a lot! – Thomas Balder Oct 29 '18 at 14:43