2

After constructing data frame df containing np.nan, None and empty string ('') values in Python 3.8.3 and Pandas 1.0.4

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'b':None,'c':''},index=[0])
df2 = pd.DataFrame({'a':1,'b':1,'c':None},index=[0])
df = pd.concat([df1,df2], axis=0, sort=True)
print(df)

data frame df looks like

     a     b     c
0  NaN  None      
0  1.0     1  None

Now I would like to store the values to Excel using to_excel() function. However, after running command

df.to_excel('nan_none_empty.xlsx')

the result rather looks like

excel_picture

with np.nan, None and empty string ('') not being separable from each other.

It is possible to separate empty string ('') from np.nan and None with option na_rep as follows

df.to_excel('nan_none_empty2.xlsx',na_rep='?')

giving result

enter image description here

But it seems like, for this problem, I am running out of options for to_excel(...) function in order to be able to separate np.nan from None in the Excel export.

How could one neatly separate between np.nan and None when exporting df to Excel?

Heikki
  • 2,214
  • 19
  • 34
  • Seems like `df.fillna(value='?')` replaces both `np.nan` and `None` with `'?'`. Therefore, `df.fillna` cannot be used to solve the problem. – Heikki Sep 23 '20 at 19:59
  • take a look at [the difference between NaN and None in pandas](https://stackoverflow.com/q/17534106/6692898), it seems there is no way to distinguish them – RichieV Sep 23 '20 at 20:01
  • Out of curiousity, why would you want to make this distinguishment? These all look like missing values – Erfan Sep 23 '20 at 20:27
  • I am trying to show a dictionary containing samples of facts in a matrix: (i) value `None` means that corresponding fact sample `{'b':None,'c':''}` had an element `b` with `None` value, (ii) value `np.nan`, means that the corresponding fact sample `{'b':None,'c':''}` did not contain element `'a'`. – Heikki Sep 24 '20 at 04:40

2 Answers2

2

If you'd like to distinguish between different null types, your best bet is to replace values before exporting to Excel. Converting to a string is one way to make sure you are not conflating None, np.NaN, pd.NaT, etc...

df1 = pd.DataFrame({'b':None,'c':''},index=[0])
df2 = pd.DataFrame({'a':1,'b':1,'c':None},index=[0])
df = pd.concat([df1,df2], axis=0, sort=True)

null_map = {'nan': '-',
            'None': '?'} # Add the string representation for other types you may need

df = df.applymap(lambda x:
                 null_map[str(x)]
                 if str(x) in null_map
                 else x)
df.to_excel('nan_none_empty2.xlsx')
mgd
  • 306
  • 1
  • 6
1

How about something like this?:

df = df.applymap(lambda x: str(x) if x in [None, ''] else x)
df.to_excel('nan_none_empty.xlsx',na_rep='np.nan')
Gil Pinsky
  • 2,388
  • 1
  • 12
  • 17