0
columns: datetime | clientid | amounts | *new_column_to_be_implemented* (rolling mean of values before but only for values that are the same in clientid)
          `day 1` |    2     |  50     |  (na)
          `day 2` |    2     |  60     |  50
          `day 3` |    1     |  45     |  (na)
          `day 4` |    2     |  45     |  110
          `day 5` |    3     |  90     |  (na)
          `day 6` |    3     |  10     |  90
          `day 7` |    2     |  10     |  105
  1. so this gets the mean of the last 2 amounts of the same clientid for example.
  2. I know it is possible to add a list and append/pop values to remember them, but is there a better way in pandas?

1 Answers1

2

Please make sure to following the guidelines described in How to make good reproducible pandas examples when asking pandas related questions, it helps a lot for reproducibility.

The key element for the answer is the pairing of the groupby and rolling methods. groupby will group all the records with the same clientid and rolling will select the correct amount of records for the mean calculation.

import pandas as pd
import numpy as np

# setting up the dataframe
data = [
          ['day 1', 2, 50],
          ['day 2', 2, 60],
          ['day 3', 1, 45],
          ['day 4', 2, 45],
          ['day 5', 3, 90],
          ['day 6', 3, 10],
          ['day 7', 2, 10]
]

columns = ['date', 'clientid', 'amounts']

df = pd.DataFrame(data=data, columns=columns)

rolling_mean = df.groupby('clientid').rolling(2)['amounts'].mean()
rolling_mean.index = rolling_mean.index.get_level_values(1)

df['client_rolling_mean'] = rolling_mean
arhr
  • 1,505
  • 8
  • 16
  • thanks a lot! I will follow the guidelines in future questions! – Bernar van Tongeren Feb 05 '21 at 09:25
  • one last question the rolling_mean.index.get_level_values(1) part is it to reset the index as it was? so it isn't grouped anymore? or is there more to it? – Bernar van Tongeren Feb 05 '21 at 14:50
  • @BernarvanTongeren indeed, after the successive `groupby` and `rolling` we have a `MultiIndex` corresponding to `(clientid, old_index)`, the `get_level_values` accesses the `old_index` – arhr Feb 05 '21 at 16:01
  • If this answers your question, please mark my response as 'Answer' – arhr Feb 05 '21 at 16:12