-1

I have the following example Pandas DataFrame

df

UserID Total Date
1      20    2019-01-01
1      18    2019-01-02
1      22    2019-01-03
1      16    2019-01-04
1      17    2019-01-05
1      26    2019-01-06
1      30    2019-01-07
1      28    2019-01-08
1      28    2019-01-09
1      28    2019-01-10
2      22    2019-01-01
2      11    2019-01-02
2      23    2019-01-03
2      14    2019-01-04
2      19    2019-01-05
2      29    2019-01-06
2      21    2019-01-07
2      22    2019-01-08
2      30    2019-01-09
2      16    2019-01-10
3      27    2019-01-01
3      13    2019-01-02
3      12    2019-01-03
3      27    2019-01-04
3      26    2019-01-05
3      26    2019-01-06
3      30    2019-01-07
3      19    2019-01-08
3      27    2019-01-09
3      29    2019-01-10
4      29    2019-01-01
4      12    2019-01-02
4      25    2019-01-03
4      11    2019-01-04
4      19    2019-01-05
4      20    2019-01-06
4      33    2019-01-07
4      24    2019-01-08
4      22    2019-01-09
4      24    2019-01-10

What I'm trying to achieve is to add a column TotalPast3Days that is basically the sum of Total of the previous 3 days (excluding the current date in the row) for that particular UserID

How can this be done?

Taher Elhouderi
  • 233
  • 2
  • 11

2 Answers2

2
totals = []
for i in len(df.index):
    if i < 3:
        totals.append(0)
    
    elif df['UserID'].iloc[i] == df['UserID'].iloc[i-3]:
        total = df['Total'].iloc[i-1] + 
                df['Total'].iloc[i-2] + 
                df['Total'].iloc[i-3]
        totals.append(total)
    else:
        totals.append(0)
df['Sum of past 3'] = totals
5idneyD
  • 204
  • 1
  • 8
  • That would work if my date series were continues, but my real data is not. 2 rows back doesn't necessarily mean 2 days ago – Taher Elhouderi Sep 20 '21 at 12:07
  • I've just made an edit to ensure it is the same user so that the sum doesn't combine different users' numbers – 5idneyD Sep 20 '21 at 12:23
2

For the first 3 days, you will get a NaN because there are no "previous 3 days (excluding the current date in the row)"; but, for the rest, you can use shift like df['TotalPast3Days'] = df['Date'].shift(1) + df['Date'].shift(2) + df['Date'].shift(3)

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
user16714199
  • 228
  • 6
  • 19
  • I thought about using shift() but the real data I have is not continuous. There are gaps in the date time-series. i.e. 2 rows back does not necessarily imply 2 days ago. – Taher Elhouderi Sep 20 '21 at 12:06
  • @TaherElhouderi - If you want a solution for different data, why don't you provide example data that is similar to your real data? – Michael Szczesny Sep 20 '21 at 12:08
  • but then how do you even get the data of the last day if the last day itself is missing? – user16714199 Sep 20 '21 at 12:09
  • If it's missing, just assume 0. I did a really slow-performing implementation using nested loops and comparing the dates with a timedelta. I was hoping there's a faster way using built-in Pandas slicing tools. – Taher Elhouderi Sep 20 '21 at 12:12
  • 1
    @TaherElhouderi - If you have a naive solution it is best practice to include your approach in your question to make it clear what you expect. – Michael Szczesny Sep 20 '21 at 12:14