Python Pandas: Calculate moving average within group

Question

I have a dataframe containing time series for 100 objects:

object  period  value 
1       1       24
1       2       67
...
1       1000    56
2       1       59
2       2       46
...
2       1000    64
3       1       54
...
100     1       451
100     2       153
...
100     1000    21

I want to calculate moving average with window 10 for the value column. I guess I have to do something like

df.groupby('object').apply(lambda ~calculate MA~)

and then merge this Series to the original dataframe by object? Can't figure out exact commands

the moving averages would be less rows, which periods should they be assigned to? — Walter Tross, Nov 16 '18 at 13:48
It would be ideal to do like this: for period 1, the MA equals just value from period 1. From period 2, MA = (value_1 + value_2) / 2, and so on until 10. After 10, it's a normal moving average — Alexandr Kapshuk, Nov 16 '18 at 13:52
I'm trying to use pd.rolling_mean(), but didn't figure it out yet — Alexandr Kapshuk, Nov 16 '18 at 13:52

score 62 · Accepted Answer · answered Nov 16 '18 at 13:52

62

You can use rolling with transform:

df['moving'] = df.groupby('object')['value'].transform(lambda x: x.rolling(10, 1).mean())

The 1 in rolling is for minimum number of periods.

answered Nov 16 '18 at 13:52

zipa

27,316
6
40
58

3

Oh it's so beautiful! I love Pandas! – Alexandr Kapshuk Nov 16 '18 at 13:55
1

This is the recommended way: I was going to answer with `pd.rolling_mean` and got the deprecation warning. So, thank you for teaching me something today @zipa – Charles Landau Nov 16 '18 at 13:55
How can I calculate Bolinger Band values within group using [talib.BBANDS()](https://www.programcreek.com/python/example/92323/talib.BBANDS)? Follwing code gives an error. (BBANDS() returns a tuple (upper, middle, lower).) `df['bb_up'], df['bb_mid'], df['bb_low'] = df.groupby('object')['value'].transform(lambda x: talib.BBANDS(x))` – David Johns Feb 20 '21 at 10:22
Space Impact's solution is 2x faster and gives same results. – Orhan Celik Jan 27 '22 at 18:44
1

@OrhanCelik Space Impact solution does not give me same result unfortunately – Alexander Jul 06 '22 at 20:44

Space Impact · Answer 2 · 2020-06-05T16:43:02.430

35

You can use rolling on groupby object directly as:

df['moving'] = df.groupby('object').rolling(10)['value'].mean()

The new pandas version throws an error when used direct assign to the column so use:

df['moving'] = df.groupby('object').rolling(10)['value'].mean().reset_index(drop=True)

edited Jun 05 '20 at 16:43

answered Nov 16 '18 at 13:58

Space Impact

13,085
23
48

1

Hi, I'm getting an error: `TypeError: incompatible index of inserted column with frame index`. No idea what's the problem... – Alexandr Kapshuk Nov 16 '18 at 15:17
Could you please provide a [Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve). – Space Impact Nov 16 '18 at 15:18
@AlexandrKapshuk It will work you can check this [`groupby.rolling`](https://pandas.pydata.org/pandas-docs/stable/groupby.html#new-syntax-to-window-and-resample-operations) – Space Impact Nov 16 '18 at 15:20
1

This should be the preferred solution. No need to use `transform`. – jtorca Mar 13 '20 at 22:40
I was able to use this approach to generate a Series, however, I get the error @AlexandrKapshuk mentions when trying to assign the Series as a column in the original df. I also note that the groupby.rolling link @SandeepKadapa mentions only generates a Series: `df = pd.DataFrame.from_records([{'object': obj, 'value': randint(1, 100)} for obj in range(100)]) s = df.groupby('object').rolling(10)['value'].mean() df['avg'] = s` – Sarah Messer Jun 05 '20 at 16:10
@SarahMesser The error is `TypeError: incompatible index of inserted column with frame index` so you need to reset index. Use `df['avg'] = s.reset_index(drop=True) `. – Space Impact Jun 05 '20 at 16:41
1

That clears the error, but df['avg'] is entirely nulls for me. My pandas version is 0.24.2 – Sarah Messer Jun 05 '20 at 18:26
2

@SarahMesser Thats because you have only one value in each group, check that `object` has only unique values that means only one value per group on which you are calculating rolling and because of this you are getting `NaN's` – Space Impact Jun 05 '20 at 18:29
1

Oops. I deleted the 'period' loop in my list comprehension in an attempt to simplify the dataframe and make the example minimal. Adding it back in fixed the nulls, as you say. – Sarah Messer Jun 05 '20 at 18:44
1

I added a shift(1) between the mean() and reset_index(), because I didn't want to include period-2's value in the calculation of period-2's mean... This seems to have jumbled up the values by shifting values across the multiindex, from one "object" group to the next. Small, manually-constructed dataframes also seem dependent on the original index. It's probably necessary to run `df.sort_values(by=['object', 'period']).reset_index(drop=True)` _before_ your groupby-rolling construction so that the indices are guaranteed to align. – Sarah Messer Jun 10 '20 at 18:59
1

The values can be added back in the wrong way if the dataframe isn't sorted by the groups, but if you just drop the new level, it'll work nicely: `.reset_index(level='value', drop=True)` – Jacob Eggers Jul 08 '20 at 05:03
In pandas 1.3.4 it seems the reset index is no longer needed. – Reed Richards Dec 11 '21 at 12:39
1

This solution is 2x faster than zipa's accepteed solution and gives same results. – Orhan Celik Jan 27 '22 at 18:46
1

This will not work when the index of df is not 0,1,2,... as the index of `df.groupby('object').rolling(10)['value'].mean().reset_index(drop=True)` will be 0,1,2... (tested in pandas 1.0.1) – FinThusiast Mar 25 '22 at 08:35

sousben · Answer 3 · 2020-08-19T15:20:49.777

12

The answers provided may not produce the desired results if you are grouping on multiple columns.

The following should cut it:

df['moving'] = df.groupby(['col_1', 'col_2', 'col_3']).rolling(10)['value'].mean().droplevel(level=[0,1,2])

edited Aug 19 '20 at 15:20

answered Jun 01 '20 at 15:00

sousben

333
2
10

2

This should be the top answer +1! (.droplevel can be replaced by .reset_index, btw). – psarka Jul 14 '20 at 15:30

score 1 · Answer 4 · answered Apr 01 '20 at 20:02

Extending the answer from @Sandeep Kadapa:

df['moving'] = df.groupby('object').rolling(10)['value'].mean().reset_index(drop=True)

The reason for reset_index is because after df.groupby we end up with a Multi Level Index and at the assignment we will get error TypeError: incompatible index of inserted column with frame index

mrhee · Answer 5 · 2020-07-08T18:45:18.093

These solutions assume the dataframe is sorted in a particular way (by object and period). For example, if the data were organized in panels (by period and object), then the assignment will fail. One general solution irrespective of sorting order is the following:

df.loc[:, 'value_sma_10'] = df.groupby(by='object')[['object', 'period']].rolling(window=10, min_periods=1, on='period').mean().reset_index(level='object')['value']

score 0 · Answer 6 · answered Apr 05 '20 at 09:03

Create a column as a chain method:

(
    df
        .assign(
            column_name = lambda x: 
                x
                    .groupby(['object'])['value']
                    .transform(lambda x: x.rolling(10)
                    .mean())
        )
)

Python Pandas: Calculate moving average within group

6 Answers6

Linked

Related