1

I'm trying to write a DataFrame to an Excel file using xlsxwriter and am having some issues with NaN values not being filled properly.

Regardless of whether I use df.fillna, df.replace or the na_rep option on the writer, a few columns retain any NaN values in the DataFrame.

Investigating using pdb.set_trace(), I found the following:

(Pdb) df['col_name'][0]
Decimal('NaN')
(Pdb) Decimal(np.nan)
Decimal('NaN')
(Pdb) df['col_name'][0]==Decimal(np.nan)
False
(Pdb) na=df['col_name'][0]
(Pdb) na
Decimal('NaN')
(Pdb) na==df['col_name'][0]
False
(Pdb) df['col_name'][0]
Decimal('NaN')

How can I identify these values to replace them if I can't define them?

jpp
  • 159,742
  • 34
  • 281
  • 339
mazjin
  • 80
  • 2
  • 10
  • Is there a specific reason you are using `decimal.Decimal` types instead of standard `float` ? – jpp Mar 13 '18 at 11:54
  • The data filling the DataFrame is pulled from a Django project, and the field relating to the column in question is a Decimal field. – mazjin Mar 13 '18 at 11:57
  • got it. I provided a solution below which should deal with the issue. – jpp Mar 13 '18 at 11:58

3 Answers3

2

If Decimal format is not important to you, you can recast as float.

If Decimal format is important to you, then pandas isn't the ideal choice since it is not vectorizable like numpy arrays.

import pandas as pd, numpy as np
from decimal import Decimal

s = pd.Series([Decimal(np.nan), Decimal(5.1)])

test = s.isnull().tolist()  # [False, False]

s = s.astype(float)

test = s.isnull().tolist()  # [True, False]
jpp
  • 159,742
  • 34
  • 281
  • 339
1

You cannot use == to check for NaN. You should use np.isnan().

SergiyKolesnikov
  • 7,369
  • 2
  • 26
  • 47
1

Simply use the following function:

def pandas_replace_decimal_by_value(df, column, value):
    idx = df[column].apply(math.isnan)
    df.loc[idx, column] = value