5

I have defined a class where its __ge__ method returns an instance of itself, and whose __bool__ method is not allowed to be invoked (similar to a Pandas Series).

Why is X.__bool__ invoked during np.int8(0) <= x, but not for any of the other examples? Who is invoking it? I have read the Data Model docs but I haven’t found my answer there.

import numpy as np
import pandas as pd

class X:
    def __bool__(self):
        print(f"{self}.__bool__")
        assert False
    def __ge__(self, other):
        print(f"{self}.__ge__")
        return X()

x = X()

np.int8(0) <= x

# Console output:
# <__main__.X object at 0x000001BAC70D5C70>.__ge__
# <__main__.X object at 0x000001BAC70D5D90>.__bool__
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File "<stdin>", line 4, in __bool__
# AssertionError

0 <= x

# Console output:
# <__main__.X object at 0x000001BAC70D5C70>.__ge__
# <__main__.X object at 0x000001BAC70D5DF0>

x >= np.int8(0)

# Console output:
# <__main__.X object at 0x000001BAC70D5C70>.__ge__
# <__main__.X object at 0x000001BAC70D5D30>


pd_ge = pd.Series.__ge__
def ge_wrapper(self, other):
    print("pd.Series.__ge__")
    return pd_ge(self, other)

pd.Series.__ge__ = ge_wrapper

pd_bool = pd.Series.__bool__
def bool_wrapper(self):
    print("pd.Series.__bool__")
    return pd_bool(self)

pd.Series.__bool__ = bool_wrapper


np.int8(0) <= pd.Series([1,2,3])

# Console output:
# pd.Series.__ge__
# 0    True
# 1    True
# 2    True
# dtype: bool
Mike R
  • 329
  • 2
  • 11

2 Answers2

1

I suspect that np.int8.__le__ is defined so that instead of returning NotImplemented and letting X.__ge__ take over, it instead tries to return something like not (np.int(8) > x), and then np.int8.__gt__ raises NotImplemented. Once X.__gt__(x, np.int8(0)) returns an instance of X rather than a Boolean value, then we need to call x.__bool__() in order to compute the value of not x.

(Still trying to track down where int8.__gt__ is defined to confirm.)

(Update: not quite. int8 uses a single generic rich comparison function that simply converts the value to a 0-dimensional array, then returns the result of PyObject_RichCompare on the array and x.)


I did find this function that appears to ultimately implement np.int8.__le__:

static NPY_INLINE int
rational_le(rational x, rational y) {
    return !rational_lt(y,x);
}

It's not clear to me how we avoid getting to this function if one of the arguments (like X) would not be a NumPy type. I think I give up.

chepner
  • 497,756
  • 71
  • 530
  • 681
  • I appreciate you looking into this for me. It was quite a headache to debug, and I gather you felt the same :) – Mike R Jun 11 '21 at 06:52
0

TL;DR

X.__array_priority__ = 1000


The biggest hint is that it works with a pd.Series.

First I tried having X inherit from pd.Series. This worked (i.e. __bool__ no longer called).

To determine whether NumPy is using an isinstance check or duck-typing approach, I removed the explicit inheritance and added (based on this answer):

@property
def __class__(self):
    return pd.Series

The operation no longer worked (i.e. __bool__ was called).

So now I think we can conclude NumPy is using a duck-typing approach. So I checked to see what attributes are being accessed on X.

I added the following to X:

def __getattribute__(self, item):
    print("getattr", item)
    return object.__getattribute__(self, item)

Again instantiating X as x, and invoking np.int8(0) <= x, we get:

getattr __array_priority__
getattr __array_priority__
getattr __array_priority__
getattr __array_struct__
getattr __array_interface__
getattr __array__
getattr __array_prepare__
<__main__.X object at 0x000002022AB5DBE0>.__ge__
<__main__.X object at 0x000002021A73BE50>.__bool__
getattr __array_struct__
getattr __array_interface__
getattr __array__
Traceback (most recent call last):
  File "<stdin>", line 32, in <module>
    np.int8(0) <= x
  File "<stdin>", line 21, in __bool__
    assert False
AssertionError

Ah-ha! What is __array_priority__? Who cares, really. With a little digging, all we need to know is that NDFrame (from which pd.Series inherits) sets this value as 1000.

If we add X.__array_priority__ = 1000, it works! __bool__ is no longer called.

What made this so difficult (I believe) is that the NumPy code didn't show up in the call stack because it is written in C. I could investigate further if I tried out the suggestion here.

Mike R
  • 329
  • 2
  • 11
  • I think we converged :) The attributes you traced, I think, are from the array produced from the original scalar in order to evaluate the rich comparison. – chepner Jun 11 '21 at 12:43