I want to do an element-wise OR operation on two pandas Series of boolean values. np.nan
s are also included.
I have tried three approaches and realized that the expression "np.nan
or False
" can be evaluted to True
, False
, and np.nan
depending on the approach.
These are my example Series:
series_1 = pd.Series([True, False, np.nan])
series_2 = pd.Series([False, False, False])
Approach #1
Using the |
operator of pandas:
In [5]: series_1 | series_2
Out[5]:
0 True
1 False
2 False
dtype: bool
Approach #2
Using the logical_or
function from numpy:
In [6]: np.logical_or(series_1, series_2)
Out[6]:
0 True
1 False
2 NaN
dtype: object
Approach #3
I define a vectorized version of logical_or
which is supposed to be evaluated row-by-row over the arrays:
@np.vectorize
def vectorized_or(a, b):
return np.logical_or(a, b)
I use vectorized_or
on the two series and convert its output (which is a numpy array) into a pandas Series:
In [8]: pd.Series(vectorized_or(series_1, series_2))
Out[8]:
0 True
1 False
2 True
dtype: bool
Question
I am wondering about the reasons for these results.
This answer explains np.logical_or
and says np.logical_or(np.nan, False)
is be True
but why does this only works when vectorized and not in Approach #2? And how can the results of Approach #1 be explained?