So I found out that the float NaN apparently doesn't equal itself. My question is how to deal with it. Let's start with a dataframe:
DF = pd.DataFrame({'X':[0, 3, None]})
DF
X
0 0.0
1 3.0
2 NaN
DF['test1'] = np.where(DF['X'] == np.nan, 1, 0)
DF['test2'] = np.where(DF['X'].isin([np.nan]), 1, 0)
DF
X test1 test2
0 0.0 0 0
1 3.0 0 0
2 NaN 0 1
So test1 and test2 aren't the same. Many others have mentioned that we should use pd.isnull()
. My question is, is it safe to just use isin()
? For example, if I need to create a new column using np.where, can I simply do:
DF['test3'] = np.where(DF['X'].isin([0, np.nan]), 1, 0)
Or should I always use pd.isnull
like so:
DF['test3'] = np.where((DF['X'] == 0) | (pd.isnull(DF['X'])), 1, 0)