8

pandas.DataFrame.to_dict converts nan to nan and null to None. As explained in Python comparison ignoring nan this is sometimes suboptimal.

Is there a way to convert all nans to None? (either in pandas or later on in Python)

E.g.,

>>> df = pd.DataFrame({"a":[1,None],"b":[None,"foo"]})
>>> df
     a     b
0  1.0  None
1  NaN   foo
>>> df.to_dict()
{'a': {0: 1.0, 1: nan}, 'b': {0: None, 1: 'foo'}}

I want

{'a': {0: 1.0, 1: None}, 'b': {0: None, 1: 'foo'}}

instead.

sds
  • 58,617
  • 29
  • 161
  • 278

2 Answers2

10
import pandas as pd

df = pd.DataFrame({"a":[1,None],"b":[None,"foo"]})
df.where((pd.notnull(df)), None)
Out[850]: 
      a     b
0     1  None
1  None   foo
df.where((pd.notnull(df)), None).to_dict()
Out[851]: {'a': {0: 1.0, 1: None}, 'b': {0: None, 1: 'foo'}}
E. Zeytinci
  • 2,642
  • 1
  • 20
  • 37
BENY
  • 317,841
  • 20
  • 164
  • 234
  • I'll note that this does the same thing, converts every column to an object type, just that it does it in two steps. – cs95 Jan 25 '18 at 22:58
  • @cᴏʟᴅsᴘᴇᴇᴅ yep, you are right , almost the same :-) – BENY Jan 25 '18 at 22:59
  • Just mentioning that since OP seems to think this is converting the data to string (which isn't the case!). – cs95 Jan 25 '18 at 23:00
  • @cᴏʟᴅsᴘᴇᴇᴅ: this is different from what you suggested because it works on the externally generated `DataFrame`, as opposed to creating a generic DF from scratch. – sds Jan 26 '18 at 02:39
  • @sds I am aware of what it does. My point in my previous comment was that the end result is the same (a generic dataframe), not a dataframe of strings like you initially surmised. I was only addressing your misconception, nothing more. – cs95 Jan 26 '18 at 02:43
3

Initialise as an object DataFrame (at your peril...):

df = pd.DataFrame({"a":[1,None],"b":[None,"foo"]}, dtype=object)    
df

      a     b
0     1  None
1  None   foo

In the first column, pandas attempts to infer the dtype, and guesses float. You can prevent that by forcing it to remain object thereby suppressing any type of conversion at all.

cs95
  • 379,657
  • 97
  • 704
  • 746
  • This is cheating. I have _numeric_ columns in the `DataFrame`, and converting it to string loses information. – sds Jan 25 '18 at 22:55
  • @sds No, there is no string conversion taking place. – cs95 Jan 25 '18 at 22:56
  • Each column is initialised as column of python objects. Pandas no longer makes assumptions about what its content is, and falls back to slow methods of operating on it. – cs95 Jan 25 '18 at 22:57
  • 2
    I had a feeling though that `df = pd.DataFrame({"a":[1,None],"b":[None,"foo"]})` was an MCVE to give a starting DF to play with. In reality, if you're at the end of a chain of processes, does it make sense to convert your whole resulting DF to `object` before `to_dict()`? – roganjosh Jan 25 '18 at 22:59
  • 1
    @sds `object != str` – juanpa.arrivillaga Jan 25 '18 at 22:59
  • @cᴏʟᴅsᴘᴇᴇᴅ I've just seen your comment on the other answer so I'm probably wrong here. – roganjosh Jan 25 '18 at 23:01
  • @roganjosh It usually doesn't make sense converting any dataframe to object except in the rarest of cases. OP seems to have a good reason for wanting to do so, so I'm not getting in their way here... – cs95 Jan 25 '18 at 23:01
  • @cᴏʟᴅsᴘᴇᴇᴅ No, what I meant by my very last comment is I missed something. `df.where((pd.notnull(df)), None).to_dict()` looks the business, but you stated it's converting to `object` type in two steps. So your answer, on the surface, _does_ look like a cheat to me because you alter the DF at creation but ultimately it doesn't matter. +1 for reshaping my thinking :) – roganjosh Jan 25 '18 at 23:06
  • @roganjosh Cheers, as long as you call `pd.DataFrame` _somewhere_, this works :D – cs95 Jan 25 '18 at 23:17