This is essentially a rehashing of the content of my answer here.
I came across some weird behaviour when trying to solve this question, using pd.notnull
.
Consider
x = ('A4', nan)
I want to check which of these items are null. Using np.isnan
directly will throw a TypeError (but I've figured out how to solve that).
Using pd.notnull
does not work.
>>> pd.notnull(x)
True
It treats the tuple as a single value (rather than an iterable of values). Furthermore, converting this to a list and then testing also gives an incorrect answer.
>>> pd.notnull(list(x))
array([ True, True])
Since the second value is nan
, the result I'm looking for should be [True, False]
. It finally works when you pre-convert to a Series:
>>> pd.Series(x).notnull()
0 True
1 False
dtype: bool
So, the solution is to Series-ify it and then test the values.
Along similar lines, another (admittedly roundabout) solution is to pre-convert to an object
dtype numpy array, and pd.notnull
or np.isnan
will work directly:
>>> pd.notnull(np.array(x, dtype=object))
Out[151]: array([True, False])
I imagine that pd.notnull
directly converts x
to a string array under the covers, rendering the NaN as a string "nan", so it is no longer a "null" value.
Is pd.notnull
doing the same thing here? Or is there something else going on under the covers that I should be aware of?
Notes
In [156]: pd.__version__
Out[156]: '0.22.0'