Filling NAN values in Pandas by using previous values

Question

I have a Pandas DataFrame in the following format.

I am trying to fill the NaN value by using the most recent non-NaN value and adding one second to the time value. For example, in this case, the program should take the most recent non-NaN value of 8:30:20 and add one second to replace the NaN value. So, the replacement value should be 8:30:21. Is there a way in Pandas to simulate this process for the entire column?

Please provide a part of your data frame to make it easily accessible for others (copy/paste) ! — Boubacar Traoré, Jun 03 '23 at 02:41
Does this answer your question? [How to replace NaNs by preceding or next values in pandas DataFrame?](https://stackoverflow.com/questions/27905295/how-to-replace-nans-by-preceding-or-next-values-in-pandas-dataframe) — ti7, Jun 03 '23 at 02:48
To make it easy for SO members to help you, it is recommended to paste a chunk of your dataframe (as I've done in my answer), it'd be great if you can avoid uploading images. — Boubacar Traoré, Jun 03 '23 at 03:31

Diego Quezada · Answer 1 · 2023-06-03T02:47:11.977

Considering a numerical dataset, you can use the following function:

def fill(df, column):
    cache = None
    for index, value in enumerate(df[column]):
        if pd.isna(value):
            if(cache):
                df.loc[index, column] = cache + 1
                cache = cache + 1
        else:
            cache = value
    return df

There are two special cases you did not consider in your description:

What if the first value of the column is None/NaN ? in my implementation, this value remains as None/NaN given that there is no "most recent value". You can pass a parameter to indicate a value for this case.
What if there are two None/NaN in a row ? in my implementation, the most recent value for the second None/NaN is going to be the new value of the first None/NaN. This behaviour is defined in the 7th line.

Boubacar Traoré · Answer 2 · 2023-06-03T09:55:02.613

I would have suggest you using the method="ffill" from pandas fillna but your case is more specific. You can add a custom function to do that in a new column as follow :

You have this input as df dataframe

datetime           time1          date     time2       x1      x2      x3    x4
0   2023-01-03  08:30:18    01/03/2023  08:30:18    62.95   62.95   62.91   62.92
1   2023-01-03  08:30:19    01/03/2023  08:30:19    62.93   62.94   62.93   62.94
2   2023-01-03  08:30:26    01/03/2023  08:30:20    62.90   62.90   62.89   62.89
3   2023-01-03  08:30:21    01/03/2023       NaN    62.90   62.90   62.89   62.89
4   2023-01-03  08:30:22    01/03/2023  08:30:22    62.91   62.91   62.91   62.91
5   2023-01-03  08:30:23    01/03/2023  08:30:23    62.91   62.92   62.91   62.92

Define the custom function and fill your NaN:

import datetime
def custom_fill_na(s):
    # take specific series as input
    formated_time = pd.to_datetime(s, format='%H:%M:%S')
    values = []
    for i, time in enumerate(formated_time.dt.time):
        if time is pd.NaT:
            values.append((formated_time[i-1] + datetime.timedelta(seconds=1)).time())
        else:
            values.append(time)
    return values

# Now call the function in new column
df['time3'] = custom_fill_na(df.time2)
print(df)

Output

      datetime     time1          date     time2       x1      x2   x3   x4        time3
0   2023-01-03  08:30:18    01/03/2023  08:30:18    62.95   62.95   62.91   62.92   08:30:18
1   2023-01-03  08:30:19    01/03/2023  08:30:19    62.93   62.94   62.93   62.94   08:30:19
2   2023-01-03  08:30:26    01/03/2023  08:30:20    62.90   62.90   62.89   62.89   08:30:20
3   2023-01-03  08:30:21    01/03/2023       NaN    62.90   62.90   62.89   62.89   08:30:21
4   2023-01-03  08:30:22    01/03/2023  08:30:22    62.91   62.91   62.91   62.91   08:30:22
5   2023-01-03  08:30:23    01/03/2023  08:30:23    62.91   62.92   62.91   62.92   08:30:23

Feel free to replace your column if desired. Hope it helps.

I was looking at your final Output data frame and noticed that on row 2, the time in the time1 column is 8:30:26 instead of 8:30:20. Was this just a copying error? — Rishab, Jun 03 '23 at 05:59
I used an automated tool to extract your dataframe from your screenshot, so it makes sense there might be minor errors in input (as you can see in my input example). You can easily edit my input. To avoid that next time, try to provide your input by yourself. — Boubacar Traoré, Jun 03 '23 at 09:52

mozway · Accepted Answer · 2023-06-03T04:59:31.727

You can convert your data to_timedelta, ffill and add 1 second:

df['col1'] = pd.to_timedelta(df['col1'])

df['col1'] = df['col1'].ffill().add(df['col1'].isna()*pd.Timedelta('1s'))

Output:

             col1
0 0 days 08:30:18
1 0 days 08:30:19
2 0 days 08:30:20
3 0 days 08:30:21
4 0 days 08:30:22

Used input:

df = pd.DataFrame({'col1': ['8:30:18', '8:30:19', '8:30:20', np.nan, '8:30:22']})

converting back to strings

Use a custom function:

def to_str(s):
    h,m = s.dt.total_seconds().divmod(3600)
    m,s = m.divmod(60)
    return (h.astype(int).astype(str).str.zfill(2)
            +':'+
            m.astype(int).astype(str).str.zfill(2)
            +':'+
            s.astype(int).astype(str).str.zfill(2)
            )
    
df['col1'] = to_str(df['col1'])

Output:

       col1
0  08:30:18
1  08:30:19
2  08:30:20
3  08:30:21
4  08:30:22

Filling NAN values in Pandas by using previous values

3 Answers3

converting back to strings