How to get the same dict from a Pandas.DataFrame.to_dict when it has `nan`?

Question

I have a Pandas DataFrame constructed from a dict with a nan (e.g.: float("nan")). When I use .to_dict on it, I get a different dict - the nan values is something "else".

Is it possible to know what this new nan value is?

Here is a toy example I created, and a bunch of checks I did:

import numpy as np
import pandas as pd

a_dict = {
            "a": (1, 2),
            "b": (3, float("nan")),
        }
df = pd.DataFrame(a_dict)

print(df.to_dict())
# {'a': {0: 1, 1: 2}, 'b': {0: 3.0, 1: nan}}

# to_dict() gives a different dict:
print(a_dict == a_dict) # True
print(df.to_dict == a_dict)  # False

print(df.to_dict()["b"][1]) # nan
print(type(df.to_dict()["b"][1])) # <class 'float'>


print(df.to_dict()["b"][1] == float("nan"))  # False
print(df.to_dict()["b"][1] == np.nan)  # False
print(df.to_dict()["b"][1] == pd.NA)  # False
print(df.to_dict()["b"][1] is None)  # False
print(np.isnan(df.to_dict()["b"][1]))  # True
print(pd.isna(df.to_dict()["b"][1]))  # True

In terms of motivation, this is biting me when I try to create tests using unittest.TestCase.assertEqual

Thanks upfront.

Related but didn't help:

This seems to be just because of `float('nan') != float('nan')`. — Mechanic Pig, Oct 12 '22 at 06:03
Also if using np.nan then instead of checking `np.nan == np.nan #False` you can check `np.nan in (np.nan,) #True` — Deepak Tripathi, Oct 12 '22 at 07:52

Firefighting Physicist · Answer 1 · 2022-10-12T08:02:55.783

As you stated to_dict() gives a different dict, but it is not related to the nan value.
df.to_dict() yields {'a': {0: 1, 1: 2}, 'b': {0: 3.0, 1: nan}} and not {'a': (1, 2), 'b': (3, nan)}, so it is not equal. Replace the nan in a_dict with a number (e.g. 4) and df.to_dict == a_dict will still evaluate to False, so the nan is not your problem.

I would like to point out that np.nan == np.nan evaluates to False. The fact that a_dict == a_dict evaluates to True is due to the definition of 'equal': Equal means that both dictionaries have the same keys and the keys refer to the same object or if the are equal. See here for more info.

To solve your initial question "How to get the same dict from a Pandas.DataFrame.to_dict?" see here. It is a pain with the tuples you have in your dict and pandas automatically setting the datatype, which makes the code below fail.

~~Basically you could do~~

d = df.to_dict('list')
{i: tuple(d[i]) for i in d.keys()} == a_dict # True

Deepak Tripathi · Answer 2 · 2022-10-12T16:16:28.177

1

May be not the best way but this is how you can check for testing only

import pandas as pd
import numpy as np

class custom_dict(dict):
    def __eq__(self, __o: object) -> bool:
        if isinstance(__o, dict):
            return self.keys() == __o.keys() and all(list(self[k1]) in (list(__o[k1]),) for k1 in self.keys())
        return False

a_dict = {
            "a": (1, 2),
            "b": (3, np.nan),
        }
df = pd.DataFrame(a_dict, dtype=object)
print(df.to_dict('list',into=custom_dict))
print(a_dict)
print(df.to_dict('list', into=custom_dict)["b"][1] in  (np.nan, )) # true
print(df.to_dict('list', into=custom_dict) == a_dict). # true

edited Oct 12 '22 at 16:16

answered Oct 12 '22 at 08:02

Deepak Tripathi

3,175
1
8
21

Looks cool! why do we need: ``` from collections import defaultdict from functools import partial ``` ? – Tal Galili Oct 12 '22 at 14:23
1

We do not need these modules, i was doing my own testing. Let me remove them. – Deepak Tripathi Oct 12 '22 at 16:16

How to get the same dict from a Pandas.DataFrame.to_dict when it has `nan`?

2 Answers2