How does pandas fillna determine NaN?

Question

When using df.fillna(), which value/function does it use to determine if a value is NaN? NaT, for instance, does not get filled but pd.isnull() captures that.

Furthermore, is there a way to parse a function to fillna which determines if a value is NaN or not e.g

df.fillna(na_function = pd.isnull,value= np.nan)

EDIT (added example):

df=pd.DataFrame(
[[0,"2018-02-10",np.nan],
     [None,NaT,0]])

df.isnull()
#[[False,False,True]
#[True,True,False]]
#

df.fillna(np.nan,inplace=True)
#[[0,"2018-02-10",np.nan]
#[np.nan,NaT,0]]
#

I want it to fill all NaN/Null values where pd.isnull()==True including NaT.

Can you be more specific, with some data sample? – jezrael Jun 08 '20 at 06:52 — jezrael, Jun 08 '20 at 06:52

score 1 · Answer 1 · answered Jun 08 '20 at 07:00

1

There is indeed a light inconsistency here. isna tests for any null value (None, NaN or NaT), while fillna only processes NaN. One could argue that it is a feature, because you can choose what version you want.

BTW, filling all null values can be easily done using isna:

df[df.isna()] = replacement_value

The actual reason is probably that isna is an alias for isnull.

answered Jun 08 '20 at 07:00

Serge Ballesta

143,923
11
122
252

That was my go-to at the beginning, but it throws `ValueError: cannot reindex from a duplicate axis`, thus I tried to see if `fillna()` could do the trick. I found the error thought (duplicated column) but still I think it is a bit... contra-intuitive that `isna` flags `NaT` but `fillna` does not. – CutePoison Jun 08 '20 at 07:02

score 1 · Answer 2 · answered Jun 08 '20 at 07:10

Assuming you are having NaN and NaT values in the dataframe, you can always check the dtypes and fill them separately. Like this:

x = df.select_dtypes(exclude=['datetime']) 
df[x.columns] = x.fillna(99)

x = df.select_dtypes(include=['datetime'])
df[x.columns] = x.fillna(pd.to_datetime('today'))

Taking your sample df as example:

In [1997]: df 
Out[1997]: 
     0          1    2
0 0.00 2018-02-10  nan
1  nan        NaT 0.00

In [1998]: df.dtypes 
Out[1998]: 
0           float64
1    datetime64[ns]
2           float64

In [1999]: x = df.select_dtypes(exclude=['datetime'])    
In [2000]: df[x.columns] = x.fillna(99) 

In [2001]: df 
Out[2001]: 
      0          1     2
0  0.00 2018-02-10 99.00
1 99.00        NaT  0.00

In [2002]: x = df.select_dtypes(include=['datetime'])    
In [2003]: df[x.columns] = x.fillna(pd.to_datetime('today')) 

In [2004]: df 
Out[2004]: 
      0                          1     2
0  0.00 2018-02-10 00:00:00.000000 99.00
1 99.00 2020-06-08 12:42:18.819089  0.00

jezrael · Answer 3 · 2020-06-08T07:20:14.737

Create dictionary for replace, like here datetimes, strings and all another values in DataFrame.fillna:

df=pd.DataFrame(
[[0,"2018-02-10",np.nan, 'a'],
     [None,pd.NaT,0, None]])
print (df)
     0          1    2     3
0  0.0 2018-02-10  NaN     a
1  NaN        NaT  0.0  None

dates = df.select_dtypes(['datetime']).columns
strings = df.select_dtypes(['object']).columns

d1 = dict.fromkeys(dates, pd.Timestamp('2000-01-01'))
d2 = dict.fromkeys(strings, 'b')
d3 = dict.fromkeys(df.columns.difference(dates.union(strings)), 1)

#https://stackoverflow.com/a/26853961
d = {**d1, **d2, **d3}
df = df.fillna(d)
print (df)
     0          1    2  3
0  0.0 2018-02-10  1.0  a
1  1.0 2000-01-01  0.0  b

Detail:

print (d)
{1: Timestamp('2000-01-01 00:00:00'), 3: 'b', 0: 1, 2: 1, 4: 1}

How does pandas fillna determine NaN?

3 Answers3