1
import pandas as pd

df = pd.DataFrame([['NewJersy',0,'2020-08-29'],
                   ['NewJersy',12,'2020-08-30'],
                   ['NewJersy',12,'2020-08-31'],
                   ['NewJersy',None,'2020-09-01'],
                   ['NewJersy',None,'2020-09-02'],
                   ['NewJersy',None,'2020-09-03'],
                   ['NewJersy',5,'2020-09-04'],
                   ['NewJersy',5,'2020-09-05'],
                   ['NewJersy',None,'2020-09-06'],
                   ['NewYork',None,'2020-08-29'],
                   ['NewYork',None,'2020-08-30'],
                   ['NewYork',8,'2020-08-31'],
                   ['NewYork',7,'2020-09-01'],
                   ['NewYork',None,'2020-09-02'],
                   ['NewYork',None,'2020-09-03']],
                   columns=['FName', 'FVal', 'GDate'])

print(df)

I want to fill NULL value with previous record values. For example Column FValue has values NULL for 20-09-01 to 20-09-03. The NULL value should be replaced with value 12 taken from previous valid value i.e.,from 20-08-31.

Also if the value for date 2020-08-29 is null then it should be replaced with zero as it is the first date and it doesn't have previous record.

I tried below code but not working

df['F'] = df['F'].fillna(method='ffill')

Check for Expected Values here: Fill Null Values image

Thanks

  • Can you create a reproductive example of your data : https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – AlexisG Sep 06 '20 at 08:59

3 Answers3

1

You should first ensure your DataFrame is sorted along time in case:

df = df.sort_values('GDate').reset_index(drop=True)

Then you must fill the first value with a 0:

if pd.isnull(df.loc[0, 'FVal']):
    df.loc[0, 'FVal'] = df.loc[0, 'FVal']

And then forward fill as you did:

df['FVal'] = df['FVal'].fillna(method='ffill')

Note that the column name is FVal not F.

Chris
  • 387
  • 1
  • 8
0

Not sure if this is what you want. But this is what I would do

>>> import math
>>> for s in df.iterrows():
...     if math.isnan(s[1][1]):
...        df.iloc[s[0],1] = df.iloc[s[0] - 1,1]
...
>>> df
       FName  FVal       GDate
0   NewJersy   0.0  2020-08-29
1   NewJersy  12.0  2020-08-30
2   NewJersy  12.0  2020-08-31
3   NewJersy  12.0  2020-09-01
4   NewJersy  12.0  2020-09-02
5   NewJersy  12.0  2020-09-03
6   NewJersy   5.0  2020-09-04
7   NewJersy   5.0  2020-09-05
8   NewJersy   5.0  2020-09-06
9    NewYork   5.0  2020-08-29
10   NewYork   5.0  2020-08-30
11   NewYork   8.0  2020-08-31
12   NewYork   7.0  2020-09-01
13   NewYork   7.0  2020-09-02
14   NewYork   7.0  2020-09-03
>>>
Peter Moore
  • 1,632
  • 1
  • 17
  • 31
  • Row 9 and 10 should be 0 as row9 is starting date and it don't have previous value – Suresh Kasu Sep 09 '20 at 06:26
  • @SurishKasu correct. Anyway you get the idea. Also it is performant to use itertuples() as it's faster than iterrows(). This is because it creates regular tuples instead of creating pandas series as it iterates. Documentation states that after python 3.7 it returns named tuples. – Peter Moore Sep 09 '20 at 13:24
0

You can try this:

df.GDate = pd.to_datetime(df.GDate)
for i in range(len(df)):
    if (np.isnan(df.FVal.loc[i])) and (i > 0):
        if (df.GDate.loc[i]-df.GDate.loc[i-1]).days == 1:
            print((df.GDate.loc[i]-df.GDate.loc[i-1]).days)
            df.FVal.loc[i] = df.FVal.loc[i-1]
        else:
            df.FVal.loc[i] = 0


Output

    FName       FVal    GDate
0   NewJersy    0.0     2020-08-29
1   NewJersy    12.0    2020-08-30
2   NewJersy    12.0    2020-08-31
3   NewJersy    12.0    2020-09-01
4   NewJersy    12.0    2020-09-02
5   NewJersy    12.0    2020-09-03
6   NewJersy    5.0     2020-09-04
7   NewJersy    5.0     2020-09-05
8   NewJersy    5.0     2020-09-06
9   NewYork     0.0     2020-08-29
10  NewYork     0.0     2020-08-30
11  NewYork     8.0     2020-08-31
12  NewYork     7.0     2020-09-01
13  NewYork     7.0     2020-09-02
14  NewYork     7.0     2020-09-03
  • Hi Mateus, This solution worked but received warning message C:\..\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\indexing.py:670: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy iloc._setitem_with_indexer(indexer, value) 1 1 1 1 1 1 – Suresh Kasu Sep 08 '20 at 03:44