4

Since some days ago I was unaware of the __index__() method until reading this question. After that, I have been reading about it in the documentation, PEP and in other SO questions.

I understood that whenever the [] operator was used in objects that could be sliced (in my case I'm interested in lists, numpy arrays and pandas), the value either for slicing or for indexing was obtained so that lst[key]=lst[key.__index__()] was fulfilled.

However, as in one of the questions, the result depended on whether PyPy or CPython was used, so I decided to check when was slicing actually done with __index__ and when it was not. I have done the following (in CPython 2.7.14):

lst = range(10)
array = np.arange(10)
series = pd.Series(lst)

And defined the following classes:

class MyIndex:
    def __index__(self):
        return 2
class MyInt(int):
    def __index__(self):
        return 3
class MyStr(str):
    def __index__(self):
        return 4

Then I tried to acces the defined objects with this used defined objects, obtaining the following:

Note: I am not posting the complete error message for readability purposes.

For MyIndex class, expected output 2:

print lst[MyIndex()]
print array[MyIndex()]
print series[MyIndex()]
# Output:
2
2
AttributeError: MyIndex instance has no attribute '__trunc__'

For MyInt class, expected output 3:

# Case 1
print lst[MyInt()]
print array[MyInt()]
print series[MyInt()]
# Output
0
0
0

# Case 2
print lst[MyInt(2)]
print array[MyInt(2)]
print series[MyInt(2)]
# Output
2
2
2

For MyStr class, expected output 4:

# Case 1
print lst[MyStr()]
print array[MyStr()]
print series[MyStr()]
# Output
4
4
KeyError: ''

# Case 2
print lst[MyStr('a')]
print array[MyStr('a')]
print series[MyStr('a')]
# Output
4
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
KeyError: 'a'

I'm really puzzled by this, mainly by the following points:

  • With lists the __index__ method is used but not for int and its childrens.
  • Numpy uses __index__ like lists, but in the last case MyStr('a') raises an error. Am I missing something or in this case __index__ is only used when MyStr is an empty string?
  • Pandas slicing is a whole world and even accepts slicing for ordered string index, so it is a relieve that __index__ is not used. Thus, my only question about pandas is if the output of a code could be different depending on the python implementation.

My question is basically the one in the title:

When is __index__ called for lists and numpy arrays? Why are there some exceptions?

Having said that, I will be happy to recieve any extra information I may have missed about this method.

OriolAbril
  • 7,315
  • 4
  • 29
  • 40
  • It's instructive to define a class with a `__getitem__` method, and look at the `args` tuple that indexing gives it. `numpy` `index_tricks.py` uses this to create a number of pseudo indexing functions, such as `np.r_` and `np.mgrid`. – hpaulj Apr 05 '18 at 20:31
  • It's only in recent releases that `numpy` rejects floats as indices. It used to allow them, truncating as needed. – hpaulj Apr 05 '18 at 20:32

1 Answers1

1

First, quoting the docs for __index__:

Called to implement operator.index(), and whenever Python needs to losslessly convert the numeric object to an integer object (such as in slicing, or in the built-in bin(), hex() and oct() functions). Presence of this method indicates that the numeric object is an integer type. Must return an integer.

Note: In order to have a coherent integer type class, when __index__() is defined __int__() should also be defined, and both should return the same value.

__index__ usually isn't called if an object is already an int, since no conversion is needed. Also, you need an __int__ method to go with __index__; some of your problems come from that. (Your MyInt inherits int.__int__, but its __index__ behavior isn't consistent with what it inherits from int, so that's also a problem.)


In CPython, lists implement the C-level sequence protocol, and CPython automatically calls __index__ for non-ints before invoking the sequence protocol. Ints just get their int value used, and your MyInt() has an int value of 0. You can trace the call chain for __index__ through PyObject_GetItem, PyNumber_AsSsize_t, and PyNumber_Index if you want.


NumPy arrays don't use the sequence protocol for indexing. They implement it, but they also implement the mapping protocol, which takes priority. NumPy arrays handle index processing themselves.

One of the things they try is PyNumber_Index, which is why they behave like lists for most of your tests. However, NumPy arrays support a lot more complex indexing than lists, and one part of the NumPy array indexing implementation is a weird special case where certain non-tuple sequences get treated as index tuples.

Your MyStr objects are sequences, and MyStr('a') triggers the special case. It gets treated as tuple(MyStr('a')), or ('a',), which isn't a valid indexing tuple.


As for Pandas, pandas.Series implements __getitem__ at Python level. It also has to process indexes manually.

For MyIndex(), it looks like it tried to call int on your MyIndex() object, which failed because you don't have an __int__ method. The error would normally have been a TypeError, which Pandas would probably handle differently, but you forgot to inherit from object, so you got a classic class, and those are weird.

Your MyInt() objects are ints and were used as ints, same as with the list and array tests.

Your MyStr() objects are strings, and Pandas treated them as strings instead of trying to interpret them as ints.

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • Now my only doubt left is what should be expected for ints whose __index__ returns a different value than __int__? In PyPy it looks like __index__ takes preference (https://stackoverflow.com/questions/49633222/slicing-elements-from-a-python-list-using-boolean-indexing) – OriolAbril Apr 05 '18 at 19:35
  • @xg.plt.py: If your `__int__` and your `__index__` don't match, your object is broken. Python does not specify which one wins, and anything you glean from examining the implementation is just implementation details, subject to change without notice. – user2357112 Apr 05 '18 at 19:38