I am importing an excel spreadsheet into pandas and need to determine how many cells have 'NONE' in them. The snippet below, my best attempt at replicating the issue, replaces 'NONE' with a None
. However, when I search for None
, the conditional statement fails. Why is that?
import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([['B',1,3.4], ['A','NONE',8.9],['C',3,4.6]]),
columns=['Part','Quantity','Cost'])
df.replace('NONE', None, inplace=True)
column = set()
count = 0
for row in range(df.shape[0]):
for col in range(df.shape[1]):
val = df.iat[row,col]
if (val == None):
column.add(df.columns[col])
count += 1
print(count)
print(list(column))
Last, when I replace None with pd.NA, then the conditional statement passes, but only if I use,
if (val is pd.NA):
and what really throws me off is when I use the IPython console to check for None,
In [0]: a = None
In [1]: a == None
Out [1]: True
I am basically trying to understand why the conditional statement fails to check for None. The intent is to scrub the DataFrame and then load it into scikit for regression analysis.
Thank you so much in advance!