Python - Fill NULL with previous record value in a column

Question

import pandas as pd

df = pd.DataFrame([['NewJersy',0,'2020-08-29'],
                   ['NewJersy',12,'2020-08-30'],
                   ['NewJersy',12,'2020-08-31'],
                   ['NewJersy',None,'2020-09-01'],
                   ['NewJersy',None,'2020-09-02'],
                   ['NewJersy',None,'2020-09-03'],
                   ['NewJersy',5,'2020-09-04'],
                   ['NewJersy',5,'2020-09-05'],
                   ['NewJersy',None,'2020-09-06'],
                   ['NewYork',None,'2020-08-29'],
                   ['NewYork',None,'2020-08-30'],
                   ['NewYork',8,'2020-08-31'],
                   ['NewYork',7,'2020-09-01'],
                   ['NewYork',None,'2020-09-02'],
                   ['NewYork',None,'2020-09-03']],
                   columns=['FName', 'FVal', 'GDate'])

print(df)

I want to fill NULL value with previous record values. For example Column FValue has values NULL for 20-09-01 to 20-09-03. The NULL value should be replaced with value 12 taken from previous valid value i.e.,from 20-08-31.

Also if the value for date 2020-08-29 is null then it should be replaced with zero as it is the first date and it doesn't have previous record.

I tried below code but not working

df['F'] = df['F'].fillna(method='ffill')

Check for Expected Values here: Fill Null Values image

Thanks

Can you create a reproductive example of your data : https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — AlexisG, Sep 06 '20 at 08:59

score 1 · Answer 1 · answered Sep 06 '20 at 14:36

You should first ensure your DataFrame is sorted along time in case:

df = df.sort_values('GDate').reset_index(drop=True)

Then you must fill the first value with a 0:

if pd.isnull(df.loc[0, 'FVal']):
    df.loc[0, 'FVal'] = df.loc[0, 'FVal']

And then forward fill as you did:

df['FVal'] = df['FVal'].fillna(method='ffill')

Note that the column name is FVal not F.

Peter Moore · Answer 2 · 2020-09-06T14:25:20.453

0

Not sure if this is what you want. But this is what I would do

>>> import math
>>> for s in df.iterrows():
...     if math.isnan(s[1][1]):
...        df.iloc[s[0],1] = df.iloc[s[0] - 1,1]
...
>>> df
       FName  FVal       GDate
0   NewJersy   0.0  2020-08-29
1   NewJersy  12.0  2020-08-30
2   NewJersy  12.0  2020-08-31
3   NewJersy  12.0  2020-09-01
4   NewJersy  12.0  2020-09-02
5   NewJersy  12.0  2020-09-03
6   NewJersy   5.0  2020-09-04
7   NewJersy   5.0  2020-09-05
8   NewJersy   5.0  2020-09-06
9    NewYork   5.0  2020-08-29
10   NewYork   5.0  2020-08-30
11   NewYork   8.0  2020-08-31
12   NewYork   7.0  2020-09-01
13   NewYork   7.0  2020-09-02
14   NewYork   7.0  2020-09-03
>>>

edited Sep 06 '20 at 14:25

answered Sep 06 '20 at 14:05

Peter Moore

1,632
1
17
31

Row 9 and 10 should be 0 as row9 is starting date and it don't have previous value – Suresh Kasu Sep 09 '20 at 06:26
@SurishKasu correct. Anyway you get the idea. Also it is performant to use itertuples() as it's faster than iterrows(). This is because it creates regular tuples instead of creating pandas series as it iterates. Documentation states that after python 3.7 it returns named tuples. – Peter Moore Sep 09 '20 at 13:24

score 0 · Accepted Answer · answered Sep 06 '20 at 14:34

You can try this:

df.GDate = pd.to_datetime(df.GDate)
for i in range(len(df)):
    if (np.isnan(df.FVal.loc[i])) and (i > 0):
        if (df.GDate.loc[i]-df.GDate.loc[i-1]).days == 1:
            print((df.GDate.loc[i]-df.GDate.loc[i-1]).days)
            df.FVal.loc[i] = df.FVal.loc[i-1]
        else:
            df.FVal.loc[i] = 0

Output

    FName       FVal    GDate
0   NewJersy    0.0     2020-08-29
1   NewJersy    12.0    2020-08-30
2   NewJersy    12.0    2020-08-31
3   NewJersy    12.0    2020-09-01
4   NewJersy    12.0    2020-09-02
5   NewJersy    12.0    2020-09-03
6   NewJersy    5.0     2020-09-04
7   NewJersy    5.0     2020-09-05
8   NewJersy    5.0     2020-09-06
9   NewYork     0.0     2020-08-29
10  NewYork     0.0     2020-08-30
11  NewYork     8.0     2020-08-31
12  NewYork     7.0     2020-09-01
13  NewYork     7.0     2020-09-02
14  NewYork     7.0     2020-09-03

Hi Mateus, This solution worked but received warning message C:\..\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\indexing.py:670: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy iloc._setitem_with_indexer(indexer, value) 1 1 1 1 1 1 — Suresh Kasu, Sep 08 '20 at 03:44

Python - Fill NULL with previous record value in a column

3 Answers3