1

I have a list of customers, dates and scores:

import pandas as pd
import datetime as dt
import numpy as np
data = pd.DataFrame(
        np.array(
            [
                ["A", dt.datetime(2017, 12, 10), 10.0],
                ["A", dt.datetime(2018, 1, 10), 10.0],
                ["A", dt.datetime(2018, 1, 15), 11.0],
                ["A", dt.datetime(2018, 1, 16), 12.0],
                ["A", dt.datetime(2018, 1, 16), 13.0],
                ["B", dt.datetime(2018, 1, 16), 10.0],
                ["A", dt.datetime(2018, 3, 1), 10.0],
            ]
        ),
        columns=["Customer", "Date", "Score", "Result"],
    )

Customer    Date    Score
0   A   2017-12-10 00:00:00 10
1   A   2018-01-10 00:00:00 10
2   A   2018-01-15 00:00:00 11
3   A   2018-01-16 00:00:00 12
4   A   2018-01-16 00:00:00 13
5   B   2018-01-16 00:00:00 10
6   A   2018-03-01 00:00:00 10

For each customer I would like to calculate the average score for the last 14 days (including today). The result should look like:

    Customer    Date    Score   Result
0   A   2017-12-10 00:00:00 10  10
1   A   2018-01-10 00:00:00 10  10
2   A   2018-01-15 00:00:00 11  10.5
3   A   2018-01-16 00:00:00 12  11.5
4   A   2018-01-16 00:00:00 13  11.5
5   B   2018-01-16 00:00:00 10  10
6   A   2018-03-01 00:00:00 10  10

Thanks!!

Carsten
  • 2,765
  • 1
  • 13
  • 28
  • Does this answer your question? [pandas groupby rolling mean/median with dropping missing values](https://stackoverflow.com/questions/56872205/pandas-groupby-rolling-mean-median-with-dropping-missing-values) – RichieV Aug 31 '20 at 15:06
  • No, that's not it, but thanks – Carsten Aug 31 '20 at 15:30

1 Answers1

3

Use DataFrame.groupby on Customer and compute the rolling mean with window size of 14 days on Score, then use DataFrame.merge to merge this rolling avg with dataframe data:

avg = data.set_index('Date').groupby('Customer').rolling('14d')['Score'].mean()
avg = avg[~avg.index.duplicated(keep='last')]

df = data.merge(avg.rename('Result'), left_on=['Customer', 'Date'], right_index=True)

Result:

print(df)
  Customer       Date Score  Result
0        A 2017-12-10    10    10.0
1        A 2018-01-10    10    10.0
2        A 2018-01-15    11    10.5
3        A 2018-01-16    12    11.5
4        A 2018-01-16    13    11.5
5        B 2018-01-16    10    10.0
6        A 2018-03-01    10    10.0
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53