How to separate between empty string, np.nan and None when using to_excel(...) function for pandas dataframe?

Question

After constructing data frame df containing np.nan, None and empty string ('') values in Python 3.8.3 and Pandas 1.0.4

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'b':None,'c':''},index=[0])
df2 = pd.DataFrame({'a':1,'b':1,'c':None},index=[0])
df = pd.concat([df1,df2], axis=0, sort=True)
print(df)

data frame df looks like

     a     b     c
0  NaN  None      
0  1.0     1  None

Now I would like to store the values to Excel using to_excel() function. However, after running command

df.to_excel('nan_none_empty.xlsx')

the result rather looks like

excel_picture

with np.nan, None and empty string ('') not being separable from each other.

It is possible to separate empty string ('') from np.nan and None with option na_rep as follows

df.to_excel('nan_none_empty2.xlsx',na_rep='?')

giving result

But it seems like, for this problem, I am running out of options for to_excel(...) function in order to be able to separate np.nan from None in the Excel export.

How could one neatly separate between np.nan and None when exporting df to Excel?

Seems like `df.fillna(value='?')` replaces both `np.nan` and `None` with `'?'`. Therefore, `df.fillna` cannot be used to solve the problem. — Heikki, Sep 23 '20 at 19:59
take a look at [the difference between NaN and None in pandas](https://stackoverflow.com/q/17534106/6692898), it seems there is no way to distinguish them — RichieV, Sep 23 '20 at 20:01
Out of curiousity, why would you want to make this distinguishment? These all look like missing values — Erfan, Sep 23 '20 at 20:27
I am trying to show a dictionary containing samples of facts in a matrix: (i) value `None` means that corresponding fact sample `{'b':None,'c':''}` had an element `b` with `None` value, (ii) value `np.nan`, means that the corresponding fact sample `{'b':None,'c':''}` did not contain element `'a'`. — Heikki, Sep 24 '20 at 04:40

score 2 · Accepted Answer · answered Sep 23 '20 at 20:08

If you'd like to distinguish between different null types, your best bet is to replace values before exporting to Excel. Converting to a string is one way to make sure you are not conflating None, np.NaN, pd.NaT, etc...

df1 = pd.DataFrame({'b':None,'c':''},index=[0])
df2 = pd.DataFrame({'a':1,'b':1,'c':None},index=[0])
df = pd.concat([df1,df2], axis=0, sort=True)

null_map = {'nan': '-',
            'None': '?'} # Add the string representation for other types you may need

df = df.applymap(lambda x:
                 null_map[str(x)]
                 if str(x) in null_map
                 else x)
df.to_excel('nan_none_empty2.xlsx')

score 1 · Answer 2 · answered Sep 23 '20 at 20:09

1

How about something like this?:

df = df.applymap(lambda x: str(x) if x in [None, ''] else x)
df.to_excel('nan_none_empty.xlsx',na_rep='np.nan')

answered Sep 23 '20 at 20:09

Gil Pinsky

2,388
1
12
17

How to separate between empty string, np.nan and None when using to_excel(...) function for pandas dataframe?

2 Answers2