Adding 2 solutions here:
Using pd.shift
(pointed out earlier by Lukas Hestermeyer; I've added a simplified version)
Using rolling window (this is literally a one-liner)
Both solutions assume that;
Dates
are sorted in ascending order (if not, sorting should be done before proceeding)
- Every weekend(saturday and sunday) record is succeeded by a monday record. Additional checks would need to be added in case of mising data
Part 1 | Data Prep:
import pandas as pd
import numpy as np
# STEP 1: Create DF
Datas = [
'2019-07-02',
'2019-07-03',
'2019-07-04',
'2019-07-05',
'2019-07-06',
'2019-07-07',
'2019-07-08',
'2022-03-10',
'2022-03-11',
'2022-03-12',
'2022-03-13',
'2022-03-14'
]
Volume = [17, 30, 20, 21, 5, 10, 12, 24, 18, 4, 1, 5]
WEEKDAY = [1, 2, 3, 4, 5, 6, 0, 3, 4, 5, 6, 0]
dic = {'Datas': Datas, 'Volume': Volume, 'WEEKDAY': WEEKDAY}
df['Datas'] = pd.to_datetime(df['Datas'])
df = df.set_index('Datas')
df = pd.DataFrame(dic)
Part 2 | Solutions:
Solution 1 [pd.shift] :
# STEP 1: add shifts
df['shift_1'] = df['Volume'].shift(1)
df['shift_2'] = df['shift_1'].shift(1)
# STEP 2: sum Volume with shifts where weekday==0
cols_to_sum = ['Volume', 'shift_1', 'shift_2']
df['Volume'] = df[['WEEKDAY'] + cols_to_sum].apply(lambda x: int(x[1]) if x[0] else int(x[1] + x[2] + x[3]), axis=1)
df = df.drop(['shift_1', 'shift_2'], axis=1)
df
Solution 2 [rolling window] :
# use rolling window of size 3 to sum where weekday == 0
df['Volume'] = np.where(
df['WEEKDAY'] == 0,
df['Volume'].rolling(window=3, center=False).sum(),
df['Volume']
)
df
Part 3 | Removing weekend records :
df = df.loc[~df['WEEKDAY'].isin([5, 6])]
df