I am trying to understand None vs NaN and proper syntax

Question

I am importing an excel spreadsheet into pandas and need to determine how many cells have 'NONE' in them. The snippet below, my best attempt at replicating the issue, replaces 'NONE' with a None. However, when I search for None, the conditional statement fails. Why is that?

import pandas as pd
import numpy as np


df = pd.DataFrame(np.array([['B',1,3.4], ['A','NONE',8.9],['C',3,4.6]]), 
    columns=['Part','Quantity','Cost'])

df.replace('NONE', None, inplace=True)

column = set()
count = 0

for row in range(df.shape[0]):
    for col in range(df.shape[1]):
        val = df.iat[row,col]
        if (val == None):
            column.add(df.columns[col])
            count += 1

print(count)
print(list(column))

Last, when I replace None with pd.NA, then the conditional statement passes, but only if I use,

if (val is pd.NA):

and what really throws me off is when I use the IPython console to check for None,

In [0]: a = None
In [1]: a == None
Out [1]: True

I am basically trying to understand why the conditional statement fails to check for None. The intent is to scrub the DataFrame and then load it into scikit for regression analysis.

Thank you so much in advance!

https://stackoverflow.com/questions/17534106/what-is-the-difference-between-nan-and-none — Mitch Wheat, Sep 14 '21 at 04:34
`None` is not equal to `NaN`, in fact, *nothing* is equal to nan — juanpa.arrivillaga, Sep 14 '21 at 04:47

score 2 · Answer 1 · answered Sep 14 '21 at 04:42

2

First issue: replacing "None" with None does not work as expected. Why: according to the docs https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html the default value of method is pad, so the None value will be overwritten. Note that None is a special value here.

Second issue: None vs np.nan. As mentioned in the comment, they are very different. type(np.nan) is float, np.nan == np.nan returns False, but np.nan is np.nan is true. Yes, it is not very intuitive.

answered Sep 14 '21 at 04:42

Kate Melnykova

1,863
1
5
17

1

Note, in general, you can't use `is` to check for `NaN` – juanpa.arrivillaga Sep 14 '21 at 04:48
1

`np.nan is np.nan` *happens* to be true, because it is the same object. NaN should really never be checked for with identity (just like any other numeric object). Note, `numpy.nan` is just a `float('nan')` object – juanpa.arrivillaga Sep 14 '21 at 04:51

I am trying to understand None vs NaN and proper syntax

1 Answers1