15

I want to do an element-wise OR operation on two pandas Series of boolean values. np.nans are also included.

I have tried three approaches and realized that the expression "np.nan or False" can be evaluted to True, False, and np.nan depending on the approach.

These are my example Series:

series_1 = pd.Series([True, False, np.nan])
series_2 = pd.Series([False, False, False])

Approach #1

Using the | operator of pandas:

In [5]: series_1 | series_2
Out[5]: 
0     True
1    False
2    False
dtype: bool

Approach #2

Using the logical_or function from numpy:

In [6]: np.logical_or(series_1, series_2)
Out[6]: 
0     True
1    False
2      NaN
dtype: object

Approach #3

I define a vectorized version of logical_or which is supposed to be evaluated row-by-row over the arrays:

@np.vectorize
def vectorized_or(a, b):
   return np.logical_or(a, b)

I use vectorized_or on the two series and convert its output (which is a numpy array) into a pandas Series:

In [8]:  pd.Series(vectorized_or(series_1, series_2))
Out[8]: 
0     True
1    False
2     True
dtype: bool

Question

I am wondering about the reasons for these results.
This answer explains np.logical_or and says np.logical_or(np.nan, False) is be True but why does this only works when vectorized and not in Approach #2? And how can the results of Approach #1 be explained?

  • 3
    [This page](http://pandas.pydata.org/pandas-docs/stable/basics.html#basics-stats) in the docs explains that the default behaviour of these pandas functions is set to skip NaNs in the data – Peter9192 May 10 '16 at 07:47
  • Does a boolean series even support `np.nan`? What about `pd.NA` rather? – jtlz2 Mar 13 '23 at 12:43
  • @Peter9192 Link has moved: https://pandas.pydata.org/pandas-docs/stable/user_guide/basics.html – jtlz2 Mar 13 '23 at 12:44

1 Answers1

5

first difference : | is np.bitwise_or. it explains the difference between #1 and #2.

Second difference : since series_1.dtype if object (non homogeneous data), operations are done row by row in the two first cases.

When using vectorize ( #3):

The data type of the output of vectorized is determined by calling the function with the first element of the input. This can be avoided by specifying the otypes argument.

For vectorized operations, you quit the object mode. data are first converted according to first element (bool here, bool(nan) is True) and the operations are done after.

jtlz2
  • 7,700
  • 9
  • 64
  • 114
B. M.
  • 18,243
  • 2
  • 35
  • 54
  • 1
    I understand the effect of vectorization now. But you say that in the second case the operations are done row by row. The way I see is that the last operation is `np.logical_or(np.nan, False)`, which is `True`, so why is the last element of the second result a `NaN`? – DataOmbudsman May 10 '16 at 11:06
  • 1
    Your other remark about `|` being `np.bitwise_or` is helpful, although using `np.bitwise_or` directly with the two Series gives a TypeError, so they are not completely the same. – DataOmbudsman May 10 '16 at 11:09