1

Suppose the below simplified dataframe. (The actual df is much, much bigger.) How does one assign values to a new column f such that f is a function of another column (e.g. e)?

df = pd.DataFrame([[1,2,3,4], [5,6,7,8], [9,10,11,12], [13,14,15,16]])
df.columns = pd.MultiIndex.from_tuples((("a", "d"), ("a", "e"), ("b", "d"), ("b","e")))
df
    a       b
    d   e   d   e
0   1   2   3   4
1   5   6   7   8
2   9  10  11  12
3  13  14  15  16

Desired Output:

    a             b
    d   e   f     d   e   f
0   1   2 nan     3   4   nan
1   5   6 1.10    7   8   0.69
2   9  10 0.51   11  12   0.41
3  13  14 0.34   15  16   0.29

where column f is computed as np.log(df['e']).diff()

jgg
  • 791
  • 4
  • 17

2 Answers2

1

You could access the MultiIndex column using loc, then use the functions directly on the sliced column, then join it back to df:

import numpy as np
df = (df.join(np.log(df.loc[:, (slice(None), 'e')])
              .diff().round(2).rename(columns={'e':'f'}, level=1))
      .sort_index(axis=1))

Output:

    a             b          
    d   e     f   d   e     f
0   1   2   NaN   3   4   NaN
1   5   6  1.10   7   8  0.69
2   9  10  0.51  11  12  0.41
3  13  14  0.34  15  16  0.29
  • Thanks. I was able to solve using a dictionary comp and then concatentation which seemed more natural. `df = {c:df[c].assign(r=np.log(df[(c,'d')]).diff()) for c in df.columns.levels[0]}` `df = pd.concat([df[c] for c in df.keys()], axis=1, keys = df.keys())` – jgg Apr 16 '22 at 15:47
  • @jgg yeah that works too –  Apr 16 '22 at 17:57
0
df = {c:df[c].assign(r=np.log(df[(c,'d')]).diff()) for c in df.columns.levels[0]} 

df = pd.concat([df[c] for c in df.keys()], axis=1, keys = df.keys())
jgg
  • 791
  • 4
  • 17