2

I need some help with groupby and expanding mean in python pandas.

I am trying to use pandas expanding mean and groupby. In this image below, I want to group by using the id column and expand mean by date. But the catch is for January I am not using expanding mean. For example, you can think January might be a past month and take the overall mean of the value column and grouping by ids.

For February and March I want to use expanding mean of value column on top of January. So for 7 Feb and id 1, the 44.5 in expanding mean column is basically mean of January before the value of 89 occurs today. The next value for id 1 is on 7-Mar which is inclusive of previous value of 89 on 7 Feb for id = 1.

So basically my idea is taking the overall mean upto Feb 1, and then use expanding mean on top of whatever mean has been calculated upto that date.

id    date    value   count(prior)   expanding mean (after feb)
1     1-Jan    28      4              44.75
2     1-Jan    43      3              37.33
3     1-Jan    69      3              57.00
1     2-Jan    31      4              44.75
2     2-Jan    22      3              37.33
1     7-Jan    82      4              44.75
2     7-Jan    47      3              37.33
3     7-Jan    79      3              57.00
1     8-Jan    38      4              44.75
3     8-Jan    23      3              57.00
1     7-Feb    89      4              44.75
2     7-Feb    22      3              37.33
3     7-Feb    80      3              57.00
2     19-Feb   91      4              33.50
3     19-Feb   97      4              62.75
1     7-Mar    48      5              53.60
2     7-Mar    98      5              45.00
3     7-Mar    35      5              69.60

I've given the count columns as a reference to how the count is increasing. It basically means everything prior to that date.

  • Please post your data in your question instead of in a screenshot – ignoring_gravity Jan 08 '20 at 10:17
  • I'm sorry this is my first post on stack overflow. I am not being able to share data as the format gets ruined. – yash_shah77 Jan 08 '20 at 10:19
  • Can you paste it in the question? If you follow the advice [here](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples), you'll increase your chances of receiving a good answer :) – ignoring_gravity Jan 08 '20 at 10:19
  • Thanks. Not sure I understand the question though - what's the logic behind the values that appear in `expanding mean` before february? – ignoring_gravity Jan 08 '20 at 10:42
  • 1
    The logic is that January is a past month for which we already have past data. Hence I've only calculated the mean for January entries using groupby. February and March are like future events, so once an entry occurs in the value column, I can add it to the expanding mean upto that date. For example, on 7-Mar for id 1, the mean is of the following numbers from `'value'` column: (28, 31, 82, 38 and 89). These are all id 1's values prior to 7-Mar. The value 48 occuring on 7-Mar is considered to be future value and hence only till 7-Feb (last occurring date) – yash_shah77 Jan 08 '20 at 10:56
  • If there was another row in February with `id1`, what would its `expanding mean` column value be? – ignoring_gravity Jan 08 '20 at 11:44
  • If there was another row in february, say with date Feb 11th, then that expanding mean would be the mean of values in Jan and the value on Feb 7. Meaning it would be a mean of every value occuring before Feb 11th. – yash_shah77 Jan 09 '20 at 08:45

0 Answers0