I have dataset as below:
import pandas as pd
data = pd.DataFrame({
'ID': ['27459', '27459', '27459', '27459', '27459', '27459', '27459', '48002', '48002', '48002'],
'Invoice_Date': ['2020-06-26', '2020-06-29', '2020-06-30', '2020-07-14', '2020-07-25',
'2020-07-30', '2020-08-02', '2020-05-13', '2020-06-20', '2020-06-28'],
'Payment_Term': [7,8,3,6,4,7,8,5,3,6],
'Payment_Date': ['2020-07-05', '2020-07-05','2020-07-03', '2020-07-21', '2020-07-31',
'2020-08-15', '2020-08-22', '2020-06-16', '2020-06-23', '2020-07-05'],
'Due_Date': ['2020-07-03', '2020-07-07', '2020-07-03', '2020-07-20', '2020-07-29',
'2020-08-06', '2020-08-10', '2020-05-18', '2020-06-23', '2020-07-04'],
'Delay': [2,-2,0,1,2,9,12,29,0,1],
'Difference_Date': [0,3,1,14,11,5,3,0,38,8],
})
data
I want to add another column for Average_Diff
that calculates the average of rows before (excluding the current row) and grouping it by ID
.
I have tried df['Average_Diff'] = df.Difference_Date.rolling(window=2).mean()
and
df['Average_Diff'] = df.Difference_Date.expanding().mean()
but it is not working. My expected output is:
ID Invoice_Date Payment_Term Payment_Date Due_Date Delay Difference_Date Average_Diff
27459 2020-06-26 7 2020-07-05 2020-07-03 2 0 0
27459 2020-06-29 8 2020-07-05 2020-07-07 -2 3 0
27459 2020-06-30 3 2020-07-03 2020-07-03 0 1 1.5
27459 2020-07-14 6 2020-07-21 2020-07-20 1 14 1.33
27459 2020-07-25 4 2020-07-31 2020-07-29 2 11 4.5
27459 2020-07-30 7 2020-08-15 2020-08-06 9 5 5.8
27459 2020-08-02 8 2020-08-22 2020-08-10 12 3 5.67
48002 2020-05-13 5 2020-06-16 2020-05-18 29 0 0
48002 2020-06-20 3 2020-06-23 2020-06-23 0 38 0
48002 2020-06-28 6 2020-07-05 2020-07-04 1 8 19