0

I am importing an excel spreadsheet into pandas and need to determine how many cells have 'NONE' in them. The snippet below, my best attempt at replicating the issue, replaces 'NONE' with a None. However, when I search for None, the conditional statement fails. Why is that?

import pandas as pd
import numpy as np


df = pd.DataFrame(np.array([['B',1,3.4], ['A','NONE',8.9],['C',3,4.6]]), 
    columns=['Part','Quantity','Cost'])

df.replace('NONE', None, inplace=True)

column = set()
count = 0

for row in range(df.shape[0]):
    for col in range(df.shape[1]):
        val = df.iat[row,col]
        if (val == None):
            column.add(df.columns[col])
            count += 1

print(count)
print(list(column))

Last, when I replace None with pd.NA, then the conditional statement passes, but only if I use,

if (val is pd.NA):

and what really throws me off is when I use the IPython console to check for None,

In [0]: a = None
In [1]: a == None
Out [1]: True

I am basically trying to understand why the conditional statement fails to check for None. The intent is to scrub the DataFrame and then load it into scikit for regression analysis.

Thank you so much in advance!

1 Answers1

2

First issue: replacing "None" with None does not work as expected. Why: according to the docs https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html the default value of method is pad, so the None value will be overwritten. Note that None is a special value here.

Second issue: None vs np.nan. As mentioned in the comment, they are very different. type(np.nan) is float, np.nan == np.nan returns False, but np.nan is np.nan is true. Yes, it is not very intuitive.

Kate Melnykova
  • 1,863
  • 1
  • 5
  • 17