2

I have a Pandas dataframe that looks like this:

| PLAYER  | DATE       | SCORE | GAME | 
|---------|------------|-------|------|
| Albert  | 2020-08-12 | 10    | X    |
| Barney  | 2020-08-12 | 100   | X    |
| Charlie | 2020-08-12 | 1000  | X    |
| Albert  | 2020-08-13 | 20    | X    |
| Barney  | 2020-08-13 | 200   | X    |
| Charlie | 2020-08-13 | 2000  | X    |
| Albert  | 2020-08-14 | 30    | Y    |
| Barney  | 2020-08-14 | 300   | Y    |
| Charlie | 2020-08-14 | 3000  | Y    |
| Albert  | 2020-08-15 | 40    | Y    |
| Barney  | 2020-08-15 | 400   | Y    |
| Charlie | 2020-08-15 | 4000  | Y    |
| Albert  | 2020-08-16 | 50    | Z    |
| Barney  | 2020-08-16 | 500   | Z    |
| Charlie | 2020-08-16 | 5000  | Z    |
| Albert  | 2020-08-17 | 60    | Z    |
| Barney  | 2020-08-17 | 600   | Z    |
| Charlie | 2020-08-17 | 6000  | Z    |

I`m trying to create a new column with 2-day score averages for each player as a subset, so that I get the following result:

| PLAYER  | DATE       | SCORE | GAME | 2-DAY AVG |
|---------|------------|-------|------|-----------|
| Albert  | 2020-08-12 | 10    | X    | NaN       |
| Barney  | 2020-08-12 | 100   | X    | NaN       |
| Charlie | 2020-08-12 | 1000  | X    | NaN       |
| Albert  | 2020-08-13 | 20    | X    | 15        | 
| Barney  | 2020-08-13 | 200   | X    | 150       |
| Charlie | 2020-08-13 | 2000  | X    | 1500      |
| Albert  | 2020-08-14 | 30    | Y    | 25        |
| Barney  | 2020-08-14 | 300   | Y    | 250       |
| Charlie | 2020-08-14 | 3000  | Y    | 2500      |
| Albert  | 2020-08-15 | 40    | Y    | 35        |
| Barney  | 2020-08-15 | 400   | Y    | 350       |
| Charlie | 2020-08-15 | 4000  | Y    | 3500      |
| Albert  | 2020-08-16 | 50    | Z    | 45        |
| Barney  | 2020-08-16 | 500   | Z    | 450       |
| Charlie | 2020-08-16 | 5000  | Z    | 4500      |
| Albert  | 2020-08-17 | 60    | Z    | 55        |
| Barney  | 2020-08-17 | 600   | Z    | 550       |
| Charlie | 2020-08-17 | 6000  | Z    | 5500      |

I've searched stack overflow and tried several combinations of code using groupby() with rolling.mean(2) functions, along with python conditional statements, but failed to do so.

Is there a clever way to do it in Pandas?

Zoivermey
  • 117
  • 1
  • 4

2 Answers2

2

This should do what you want:

df['2-DAY AVG'] = df.groupby('PLAYER').SCORE.apply(lambda x: x.rolling(2).mean())
addem
  • 111
  • 3
2

@addem already provided you the answer. Here's the full code. You can do it without the lambda function as well.

import pandas as pd
data = {'PLAYER':['Albert', 'Barney', 'Charlie',
                  'Albert', 'Barney', 'Charlie',
                  'Albert', 'Barney', 'Charlie',
                  'Albert', 'Barney', 'Charlie',
                  'Albert', 'Barney', 'Charlie',
                  'Albert', 'Barney', 'Charlie']
        'DATE':['2020-08-12','2020-08-12','2020-08-12',
                '2020-08-13','2020-08-13','2020-08-13',
                '2020-08-14','2020-08-14','2020-08-14',
                '2020-08-15','2020-08-15','2020-08-15',
                '2020-08-16','2020-08-16','2020-08-16',
                '2020-08-17','2020-08-17','2020-08-17'],
        'SCORE':[10,100,1000,20,200,2000,30,300,3000,
                 40,400,4000,50,500,5000,60,600,6000],
        'GAME':['X','X','X','X','X','X',
                'Y','Y','Y','Y','Y','Y',
                'Z','Z','Z','Z','Z','Z']}

df = pd.DataFrame(data)

df['2-DAY AVG'] = df.groupby('PLAYER')['SCORE'].rolling(2).mean().reset_index(0,drop=True)
print (df)

Output will be as follows:

     PLAYER        DATE  SCORE GAME  2-DAY AVG
0    Albert  2020-08-12     10    X        NaN
1    Barney  2020-08-12    100    X        NaN
2   Charlie  2020-08-12   1000    X        NaN
3    Albert  2020-08-13     20    X       15.0
4    Barney  2020-08-13    200    X      150.0
5   Charlie  2020-08-13   2000    X     1500.0
6    Albert  2020-08-14     30    Y       25.0
7    Barney  2020-08-14    300    Y      250.0
8   Charlie  2020-08-14   3000    Y     2500.0
9    Albert  2020-08-15     40    Y       35.0
10   Barney  2020-08-15    400    Y      350.0
11  Charlie  2020-08-15   4000    Y     3500.0
12   Albert  2020-08-16     50    Z       45.0
13   Barney  2020-08-16    500    Z      450.0
14  Charlie  2020-08-16   5000    Z     4500.0
15   Albert  2020-08-17     60    Z       55.0
16   Barney  2020-08-17    600    Z      550.0
17  Charlie  2020-08-17   6000    Z     5500.0


#just for fun, I created a fancier way to create the dict 'data'
#I was curious to find a simpler way to repeat the contents in 'data'
#both dicts data and data1 are same (before we perform mean operation)

data_1 = {'PLAYER':['Albert','Barney','Charlie']*6,
          'DATE':['2020-08-12']*3 + ['2020-08-13']*3 + ['2020-08-14']*3+
                 ['2020-08-15']*3 + ['2020-08-16']*3 + ['2020-08-17']*3,
          'SCORE':[i*((10)**j) for i in range (1,7) for j in range (1,4)],
          'GAME':['X']*6 + ['Y']*6 + ['Z']*6}


df1 = pd.DataFrame(data1)
Joe Ferndz
  • 8,417
  • 2
  • 13
  • 33