Replacing NaNs in DataFrame not working due to non-float NaN values

Question

I'm trying to write a DataFrame to an Excel file using xlsxwriter and am having some issues with NaN values not being filled properly.

Regardless of whether I use df.fillna, df.replace or the na_rep option on the writer, a few columns retain any NaN values in the DataFrame.

Investigating using pdb.set_trace(), I found the following:

(Pdb) df['col_name'][0]
Decimal('NaN')
(Pdb) Decimal(np.nan)
Decimal('NaN')
(Pdb) df['col_name'][0]==Decimal(np.nan)
False
(Pdb) na=df['col_name'][0]
(Pdb) na
Decimal('NaN')
(Pdb) na==df['col_name'][0]
False
(Pdb) df['col_name'][0]
Decimal('NaN')

How can I identify these values to replace them if I can't define them?

Is there a specific reason you are using `decimal.Decimal` types instead of standard `float` ? — jpp, Mar 13 '18 at 11:54
The data filling the DataFrame is pulled from a Django project, and the field relating to the column in question is a Decimal field. — mazjin, Mar 13 '18 at 11:57
got it. I provided a solution below which should deal with the issue. — jpp, Mar 13 '18 at 11:58

score 2 · Accepted Answer · answered Mar 13 '18 at 11:57

If Decimal format is not important to you, you can recast as float.

If Decimal format is important to you, then pandas isn't the ideal choice since it is not vectorizable like numpy arrays.

import pandas as pd, numpy as np
from decimal import Decimal

s = pd.Series([Decimal(np.nan), Decimal(5.1)])

test = s.isnull().tolist()  # [False, False]

s = s.astype(float)

test = s.isnull().tolist()  # [True, False]

SergiyKolesnikov · Answer 2 · 2018-03-13T12:04:46.907

1

You cannot use == to check for NaN. You should use np.isnan().

edited Mar 13 '18 at 12:04

answered Mar 13 '18 at 11:49

SergiyKolesnikov

7,369
2
26
47

`np.isnan` won't accept the value as an input type, but using `pd.isnull` returns `False`. – mazjin Mar 13 '18 at 11:52

Markus Semmler · Answer 3 · 2021-03-31T23:18:29.383

1

Simply use the following function:

def pandas_replace_decimal_by_value(df, column, value):
    idx = df[column].apply(math.isnan)
    df.loc[idx, column] = value

edited Mar 31 '21 at 23:18

answered Mar 31 '21 at 23:13

Markus Semmler

11
2

Replacing NaNs in DataFrame not working due to non-float NaN values

3 Answers3