20

Python does a lot with magic methods and most of these are part of some protocol. I am familiar with the "iterator protocol" and the "number protocol" but recently stumbled over the term "sequence protocol". But even after some research I'm not exactly sure what the "sequence protocol" is.

For example the C API function PySequence_Check checks (according to the documentation) if some object implements the "sequence protocol". The source code indicates that this is a class that's not a dict but implements a __getitem__ method which is roughly identical to what the documentation on iter also states:

[...]must support the sequence protocol (the __getitem__() method with integer arguments starting at 0).[...]

But the requirement to start with 0 isn't something that's "implemented" in PySequence_Check.

Then there is also the collections.abc.Sequence type, which basically says the instance has to implement __reversed__, __contains__, __iter__ and __len__.

But by that definition a class implementing the "sequence protocol" isn't necessarily a Sequence, for example the "data model" and the abstract class guarantee that a sequence has a length. But a class just implementing __getitem__ (passing the PySequence_Check) throws an exception when using len(an_instance_of_that_class).

Could someone please clarify for me the difference between a sequence and the sequence protocol (if there's a definition for the protocol besides reading the source code) and when to use which definition?

Géry Ogam
  • 6,336
  • 4
  • 38
  • 67
MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • 1
    `collections.abc.Sequence` requires `__getitem__` and `__len__`. There are mixin methods for everything else. Regarding iteration, if just `__getitem__` is defined without an `__iter__`, then built-in `iter` instantiates a simple iterator that starts at index 0. For `reversed` to work `__len__` also has to be defined, so it can start at the last index. – Eryk Sun Apr 23 '17 at 00:36
  • @eryksun But a class doesn't need a `__len__` to implement the sequence protocol (as far as `PySequence_Check` is concerned). And a class implementing `__len__` and `__getitem__` but not inheriting from `collections.abc.Sequence` isn't passing `isinstance(an_instance, Sequence)`. That is what triggered my question. :) – MSeifert Apr 23 '17 at 00:42

2 Answers2

18

It's not really consistent.

Here's PySequence_Check:

int
PySequence_Check(PyObject *s)
{
    if (PyDict_Check(s))
        return 0;
    return s != NULL && s->ob_type->tp_as_sequence &&
        s->ob_type->tp_as_sequence->sq_item != NULL;
}

PySequence_Check checks if an object provides the C sequence protocol, implemented through a tp_as_sequence member in the PyTypeObject representing the object's type. This tp_as_sequence member is a pointer to a struct containing a bunch of functions for sequence behavior, such as sq_item for item retrieval by numeric index and sq_ass_item for item assignment.

Specifically, PySequence_Check requires that its argument is not a dict, and that it provides sq_item.

Types with a __getitem__ written in Python will provide sq_item regardless of whether they're conceptually sequences or mappings, so a mapping written in Python that doesn't inherit from dict will pass PySequence_Check.


On the other hand, collections.abc.Sequence only checks whether an object concretely inherits from collections.abc.Sequence or whether its class (or a superclass) is explicitly registered with collections.abc.Sequence. If you just implement a sequence yourself without doing either of those things, it won't pass isinstance(your_sequence, Sequence). Also, most classes registered with collections.abc.Sequence don't support all of collections.abc.Sequence's methods. Overall, collections.abc.Sequence is a lot less reliable than people commonly expect it to be.


As for what counts as a sequence in practice, it's usually anything that supports __len__ and __getitem__ with integer indexes starting at 0 and isn't a mapping. If the docs for a function say it takes any sequence, that's almost always all it needs. Unfortunately, "isn't a mapping" is hard to test for, for reasons similar to how "is a sequence" is hard to pin down.

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • I think `PySequence_Check` excludes `dict`s because subclasses could implement `__getitem__`, as for custom classes they do return `True` when they implement `__getitem__`: https://gist.github.com/MSeifert04/e39d91f9d262618a32f7db14aaab15f4. Thank you for the answer (especially for pointing out that `collections.abc.Sequence` doesn't have a `__subclasshook__` was new to me), I'll leave it unaccepted for another day in case someone else wants to provide an answer. – MSeifert Apr 23 '17 at 01:07
  • 1
    @MSeifert: Yeah, I was wrong about user-defined classes and `sq_item`. I could've sworn there wasn't handling to provide `sq_item` by wrapping `__getitem__`, but apparently there is, and it's not new. – user2357112 Apr 23 '17 at 01:13
  • 2
    See [issue 23864](http://bugs.python.org/issue23864) regarding the limitations for `issubclass` with the ABCs that aren't "one-trick ponies". It's always seemed needlessly limited to me. – Eryk Sun Apr 23 '17 at 01:23
  • @eryksun I have asked Guido if he can add the missing `__subclasshook__` in your issue as it is still open. By the way, do you know why the table of [Collections Abstract Base Classes](https://docs.python.org/3.7/library/collections.abc.html) is missing some abstract of mixin methods (for instances, the class `Reversible` is missing the mixin method `__iter__` and the class `Coroutine` is missing the abstract method `__await__`)? – Géry Ogam Dec 10 '18 at 00:27
  • @Maggyero: Those are both inherited abstract methods. `Reversible` inherits `__iter__` from `Iterable`, and `Coroutine` inherits `__await__` from `Awaitable`. (Also, `__iter__` isn't a mixin method because implementing `__iter__` in terms of `__reversed__` would be inefficient and bizarre.) – user2357112 Dec 10 '18 at 03:19
  • @user2357112 Okay, but why aren't these inherited abstract methods listed in the column _Abstract Methods_ of the `Reversible` and `Coroutine` classes (contrary to the inherited abstract methods of the `Collection` class for instance which are listed)? – Géry Ogam Dec 10 '18 at 07:55
0

For a type to be in accordance with the sequence protocol, these 4 conditions must be met:

  • Retrieve elements by index

    item = seq[index]

  • Find items by value

    index = seq.index(item)

  • Count items

    num = seq.count(item)

  • Produce a reversed sequence

    r = reversed(seq)

Community
  • 1
  • 1
FruitBat
  • 25
  • 1