So, I have this dataframe where the 'EndDate' is in the same row as 'StartDate'. I would like to replace it to the same position as the 'Date' column based on its date. Also, it will forward fill the value available based on the 'NewDate'
Here is the original data frame:
df = pd.DataFrame({
'Date': ['1/1/2022', '1/2/2022', '1/3/2022', '1/4/2022', '1/5/2022',
'1/6/2022', '1/7/2022', '1/8/2022','1/9/2022'],
'Value': [None, 10, None, 15, None, None, 50, None,None],
'StartDate': [None, '1/2/2022', None, '1/4/2022', None, None, '1/7/2022', None, None],
'EndDate': [None, '1/3/2022', None, '1/6/2022', None, None, '1/9/2022', None,None]
})
Date Value StartDate EndDate
0 1/1/2022 NaN None None
1 1/2/2022 10.0 1/2/2022 1/3/2022
2 1/3/2022 NaN None None
3 1/4/2022 15.0 1/4/2022 1/6/2022
4 1/5/2022 NaN None None
5 1/6/2022 NaN None None
6 1/7/2022 50.0 1/7/2022 1/9/2022
7 1/8/2022 NaN None None
8 1/9/2022 NaN None None
I would like the output to be such as this dataframe:
# output
output = pd.DataFrame({
'Date': ['1/1/2022', '1/2/2022', '1/3/2022', '1/4/2022', '1/5/2022',
'1/6/2022', '1/7/2022', '1/8/2022','1/9/2022'],
'Value': [None, 10, 10, 15, 15, 15, 50, 50,50],
'StartDate': [None, '1/2/2022', None, '1/4/2022', None, None, '1/7/2022', None, None],
'EndDate': [None, '1/3/2022', None, '1/6/2022', None, None, '1/9/2022', None,None],
'NewDate': [None, None, '1/3/2022', None, None, '1/6/2022', None, None,'1/9/2022']
})
Date Value StartDate EndDate NewDate
0 1/1/2022 NaN None None None
1 1/2/2022 10.0 1/2/2022 1/3/2022 None
2 1/3/2022 10.0 None None 1/3/2022
3 1/4/2022 15.0 1/4/2022 1/6/2022 None
4 1/5/2022 15.0 None None None
5 1/6/2022 15.0 None None 1/6/2022
6 1/7/2022 50.0 1/7/2022 1/9/2022 None
7 1/8/2022 50.0 None None None
8 1/9/2022 50.0 None None 1/9/2022
I only know how to forward fill the value if the date is already aligned but to take the date is I'm not sure yet.
end_indexes = df[df['NewDate'].notnull()].index
for i in range(len(end_indexes)):
start_index = end_indexes[i-1] + 1 if i > 0 else 0
end_index = end_indexes[i]
df.loc[start_index:end_index, 'Value'] = df.loc[start_index:end_index, 'Value'].ffill()