0

So, I have this dataframe where the 'EndDate' is in the same row as 'StartDate'. I would like to replace it to the same position as the 'Date' column based on its date. Also, it will forward fill the value available based on the 'NewDate'

Here is the original data frame:

df = pd.DataFrame({
    'Date': ['1/1/2022', '1/2/2022', '1/3/2022', '1/4/2022', '1/5/2022', 
             '1/6/2022', '1/7/2022', '1/8/2022','1/9/2022'],
    'Value': [None, 10, None, 15, None, None, 50, None,None],
    'StartDate': [None, '1/2/2022', None, '1/4/2022', None, None, '1/7/2022', None, None],
    'EndDate': [None, '1/3/2022', None,  '1/6/2022',  None, None, '1/9/2022', None,None]
})
       Date  Value StartDate   EndDate
0  1/1/2022    NaN      None      None
1  1/2/2022   10.0  1/2/2022  1/3/2022
2  1/3/2022    NaN      None      None
3  1/4/2022   15.0  1/4/2022  1/6/2022
4  1/5/2022    NaN      None      None
5  1/6/2022    NaN      None      None
6  1/7/2022    50.0  1/7/2022  1/9/2022
7  1/8/2022    NaN      None      None
8  1/9/2022    NaN      None      None

I would like the output to be such as this dataframe:

# output
output = pd.DataFrame({
    'Date': ['1/1/2022', '1/2/2022', '1/3/2022', '1/4/2022', '1/5/2022', 
             '1/6/2022', '1/7/2022', '1/8/2022','1/9/2022'],
    'Value': [None, 10, 10, 15, 15, 15, 50, 50,50],
    'StartDate': [None, '1/2/2022', None, '1/4/2022', None, None, '1/7/2022', None, None],
    'EndDate': [None, '1/3/2022', None,  '1/6/2022',  None, None, '1/9/2022', None,None],
    'NewDate': [None, None, '1/3/2022',  None,  None, '1/6/2022', None, None,'1/9/2022']
})
       Date  Value StartDate   EndDate   NewDate
0  1/1/2022    NaN      None      None      None
1  1/2/2022   10.0  1/2/2022  1/3/2022      None
2  1/3/2022   10.0      None      None  1/3/2022
3  1/4/2022   15.0  1/4/2022  1/6/2022      None
4  1/5/2022   15.0      None      None      None
5  1/6/2022   15.0      None      None  1/6/2022
6  1/7/2022   50.0  1/7/2022  1/9/2022      None
7  1/8/2022   50.0      None      None      None
8  1/9/2022   50.0      None      None  1/9/2022

I only know how to forward fill the value if the date is already aligned but to take the date is I'm not sure yet.

end_indexes = df[df['NewDate'].notnull()].index

for i in range(len(end_indexes)):
    start_index = end_indexes[i-1] + 1 if i > 0 else 0
    end_index = end_indexes[i]
    df.loc[start_index:end_index, 'Value'] = df.loc[start_index:end_index, 'Value'].ffill()
  • 1
    Maybe it's just this example, but it looks like you just want to shift the EndDate and fill the value by 1 row? `df['NewDate'] = df['EndDate'].shift(); df['Value'] = df['Value'].ffill(limit=1)` - or is there more to it. – jqurious Jun 20 '23 at 23:39
  • Please clarify what yoiu mean by " I would like to replace it to the same position as the 'Date' column based on its date. Also, it will forward fill the value available based on the 'NewDate'" What is the logic you are trying to implement? Try to walk us through a couple of lines of your desired output and explain how you get the results shown. – itprorh66 Jun 20 '23 at 23:39
  • Beside the point, but are these dates m/d/y or d/m/y? You could convert them to ISO format so that we don't even have to think about it. See also [How to make good reproducible pandas examples](/q/20109391/4518341). – wjandrea Jun 20 '23 at 23:47
  • @jqurious There's more than a row to forward fill, but for this example, I just show small date range, because let say that the 'EndDate' at February, I couldn't share the whole dataframe here. – Nazmi Husaini Jun 20 '23 at 23:54
  • @itprorh66 I tried to bring the date in 'EndDate' to match with the same date in the 'Date' column but I don't know how to do that, however, for filling in the value, I could do it because I already have the 'NewDate' column and using the code I mentioned above where it will take all notnull index and then fill in the value and repeat it until it happened for all indexes in 'NewDate' – Nazmi Husaini Jun 20 '23 at 23:54
  • @wjandrea Understood, I will look forward on the page you shared and revise my questions in the future. For question above, it is m/d/y. – Nazmi Husaini Jun 20 '23 at 23:56

1 Answers1

2
df.assign(
   NewDate = df.where(df.Date.isin(df.EndDate)).Date,
   Value   = df.Value.ffill()
)
       Date  Value StartDate   EndDate   NewDate
0  1/1/2022    NaN      None      None       NaN
1  1/2/2022   10.0  1/2/2022  1/3/2022       NaN
2  1/3/2022   10.0      None      None  1/3/2022
3  1/4/2022   15.0  1/4/2022  1/6/2022       NaN
4  1/5/2022   15.0      None      None       NaN
5  1/6/2022   15.0      None      None  1/6/2022
6  1/7/2022   50.0  1/7/2022  1/9/2022       NaN
7  1/8/2022   50.0      None      None       NaN
8  1/9/2022   50.0      None      None  1/9/2022
jqurious
  • 9,953
  • 1
  • 4
  • 14