Efficient way to perform operations (rolling mean/add new columns) in each group from pandas groupby

Question

I have a dataframe like below, with 2 different SecuCode. I'd need to group them by column SecuCode, create a new column from Volume and AShare, then do a rolling mean.

    SecuCode    TradingDay  Volume  AShare
0   600455.SH   2013-01-04  1484606 49717768
1   600455.SH   2013-01-07  1315166 49717768
2   600455.SH   2013-01-08  1675933 49717768
3   600455.SH   2013-01-09  1244098 49717768
4   600455.SH   2013-01-10  751279  49717768
5   600551.SH   2018-12-24  1166098 505825296
6   600551.SH   2018-12-25  3285799 505825296
7   600551.SH   2018-12-26  3534143 505825296
8   600551.SH   2018-12-27  2462501 505825296
9   600551.SH   2018-12-28  2282954 505825296

for a single dataframe, the code will be like:

df['volumn_percentage'] = df['Volume']/df['AShare']
df['turnover'] = df['volumn_percentage'].rolling(2).mean()

I need to do these multi-step operations in each frame from

df_grouped = df.groupby(by='SecuCode')
state, frame = next(iter(df_grouped))

I wonder how can I do the same thing in each group of this dataframe, and then restore to the original dataframe format? there will be ~1000 unique SecuCode, and there will be ~1000 trading days, so looping through all frame/groups seems to be very slow. I may need to do more operations other than the above 2 (1 column divided by another, rolling mean), so a more general ways is much appreciated.

`df.groupby('SecuCode')['volumn_percentage'].rolling(2).mean()`? — Quang Hoang, Aug 07 '20 at 22:16
@QuangHoang I also need the new column in the grouped dataframe for later usage, so this seems not quite work for this case — Partita, Aug 07 '20 at 22:33
@TrentonMcKinney not really, I need to add 2 columns in each grouped dataframe. — Partita, Aug 07 '20 at 23:46

Trenton McKinney · Accepted Answer · 2020-08-08T00:06:52.237

Use pandas.DataFrame.groupby and .apply a function with your calculations

import pandas as pd

data = {'SecuCode': ['600455.SH', '600455.SH', '600455.SH', '600455.SH', '600455.SH', '600551.SH', '600551.SH', '600551.SH', '600551.SH', '600551.SH'],
        'TradingDay': ['2013-01-04', '2013-01-07', '2013-01-08', '2013-01-09', '2013-01-10', '2018-12-24', '2018-12-25', '2018-12-26', '2018-12-27', '2018-12-28'],
        'Volume': [1484606, 1315166, 1675933, 1244098, 751279, 1166098, 3285799, 3534143, 2462501, 2282954],
        'AShare': [49717768, 49717768, 49717768, 49717768, 49717768, 505825296, 505825296, 505825296, 505825296, 505825296]}

df = pd.DataFrame(data)

# function with calculations
def calcs(df: pd.DataFrame) -> pd.DataFrame:
    df['volumn_percentage'] = df['Volume']/df['AShare']
    df['turnover'] = df['volumn_percentage'].rolling(2).mean()
    return df


# groupby and apply the function with the calculations
df_new = df.groupby('SecuCode').apply(calcs)

# print(df_new)
    SecuCode  TradingDay   Volume     AShare  volumn_percentage  turnover
0  600455.SH  2013-01-04  1484606   49717768           0.029861       NaN
1  600455.SH  2013-01-07  1315166   49717768           0.026453  0.028157
2  600455.SH  2013-01-08  1675933   49717768           0.033709  0.030081
3  600455.SH  2013-01-09  1244098   49717768           0.025023  0.029366
4  600455.SH  2013-01-10   751279   49717768           0.015111  0.020067
5  600551.SH  2018-12-24  1166098  505825296           0.002305       NaN
6  600551.SH  2018-12-25  3285799  505825296           0.006496  0.004401
7  600551.SH  2018-12-26  3534143  505825296           0.006987  0.006741
8  600551.SH  2018-12-27  2462501  505825296           0.004868  0.005928
9  600551.SH  2018-12-28  2282954  505825296           0.004513  0.004691

Thanks very much! I think it is indeed a good way to do that — Partita, Aug 08 '20 at 00:10

Efficient way to perform operations (rolling mean/add new columns) in each group from pandas groupby

1 Answers1