8

This question is an attempt to generalise the solution provided for the this question:

Pandas: add a column to a multiindex column dataframe

I need to produce a column for each column index.

The solution provided by spencerlyon2 works when we want to add a single column:

df['bar', 'three'] = [0, 1, 2]

However I would like to generalise this operation for every first level column index.

Source DF:

In [1]: df
Out[2]:
first        bar                 baz
second       one       two       one       two
A      -1.089798  2.053026  0.470218  1.440740
B       0.488875  0.428836  1.413451 -0.683677
C      -0.243064 -0.069446 -0.911166  0.478370

Target DF below, requires that the three column is the addition of the one and two columns of its respective index.

In [1]: df
Out[2]:
first        bar                           baz                 
second       one       two     three       one       two      three
A      -1.089798  2.053026  0.963228‬  1.440740 -2.317647  -0.876907‬
B       0.488875  0.428836  0.917711 -0.683677  0.345873  -0.337804‬
C      -0.243064 -0.069446 -0.312510  0.478370  0.266761   0.745131‬
SO_tourist
  • 430
  • 6
  • 13

2 Answers2

3

You can use join with two data frames with same indexes to create a bunch of columns all at once.


First, calculate the sum using groupby against axis=1

ndf = df.groupby(df.columns.get_level_values(0), axis=1).sum()

        bar       baz
A  0.963228  1.910958
B  0.917711  0.729774
C -0.312510 -0.432796

(PS: If you have more than two columns, you may do

df.loc[:, (slice(None), ['one', 'two'])].groupby(df.columns.get_level_values(0), axis=1).sum()

to slice only columns 'one' and 'two' first, and just then groupby)

Then, make it match your column indexes, i.e. make it a MultiIndexed data frame just like your original data frame

ndf.columns = pd.MultiIndex.from_product([ndf.columns, ['three']])

        bar       baz
      three     three
A  0.963228  1.910958
B  0.917711  0.729774
C -0.312510 -0.432796

Finally, df.join

finaldf = df.join(ndf).sort_index(axis=1)

If you really care about the ordering, use reindex

finaldf.reindex(['one', 'two', 'three'], axis=1, level=1)

first        bar                           baz                    
second       one       two     three       one       two     three
A      -1.089798  2.053026  0.963228  0.470218  1.440740  1.910958
B       0.488875  0.428836  0.917711  1.413451 -0.683677  0.729774
C      -0.243064 -0.069446 -0.312510 -0.911166  0.478370 -0.432796
rafaelc
  • 57,686
  • 15
  • 58
  • 82
  • Thank you rafaelc for your reply. For some reason, I was getting a ValueError `ValueError("Grouper and axis must be same length")` within Groupby when using `df.columns.get_level_values(0)`. I changed this to `level=0` and it worked. Any idea what is raising the value error? – SO_tourist Nov 18 '19 at 10:02
  • @SO_tourist you likely forgot to add "axis=1" as an option in the groupby – rafaelc Nov 18 '19 at 14:04
1

I started from your sample input:

first        bar                 baz          
second       one       two       one       two
A      -1.089798  2.053026  0.470218  1.440740
B       0.488875  0.428836  1.413451 -0.683677
C      -0.243064 -0.069446 -0.911166  0.478370

To add a new column to each level 0 of the column MultiIndex, you can run something like:

for c1 in df.columns.get_level_values('first').unique():
    # New column int index
    cInd = int(df.columns.get_loc(c1).stop)
    col = (c1, 'three')      # New column name
    newVal = df[(c1, 'one')] + df[(c1, 'two')]
    df.insert(loc=cInd, column=col, value=newVal)  # Insert the new column

In the above example, values in new columns are consecutive numbers, but in your case set them as you wish.

The result of my code (after the column sort) is:

first        bar                           baz                    
second       one       two     three       one       two     three
A      -1.089798  2.053026  0.963228  0.470218  1.440740  1.910958
B       0.488875  0.428836  0.917711  1.413451 -0.683677  0.729774
C      -0.243064 -0.069446 -0.312510 -0.911166  0.478370 -0.432796
Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41
  • Thank you for your reply. I amended my question slightly as I might not have explained myself clearly. I would require a single column to be added per index, and for the sake of example, the new column is the addition of the `one` and the `two` columns – SO_tourist Nov 16 '19 at 15:56
  • I corected my answer. Now a new column is added for each name at the top level of the column MultiIndex. – Valdi_Bo Nov 16 '19 at 16:32
  • How come the result is correct? OP explicitly stated `three` should be the sum of `one` and `two` haha ;p – rafaelc Nov 16 '19 at 16:33
  • You are right, initially I missed this detail. Now the new column contains the sum. – Valdi_Bo Nov 16 '19 at 16:44
  • Now looks correct! But even though this works, it will be very very very slow ;/ – rafaelc Nov 16 '19 at 16:48