12

The Python docs are a bit ambiguous

sequence

An iterable which supports efficient element access using integer indices via the __getitem__() special method and defines a __len__() method that returns the length of the sequence. Some built-in sequence types are list, str, tuple, and bytes. Note that dict also supports __getitem__() and __len__(), but is considered a mapping rather than a sequence because the lookups use arbitrary immutable keys rather than integers.

The collections.abc.Sequence abstract base class defines a much richer interface that goes beyond just __getitem__() and __len__(), adding count(), index(), __contains__(), and __reversed__(). Types that implement this expanded interface can be registered explicitly using register().

In particular, using abc.collections.Sequence as the gold standard as recommended by some would mean that, for example, numpy arrays are not sequences:

isinstance(np.arange(6),collections.abc.Sequence)
# False

There is also something called the Sequence Protocol but that appears to be exposed only at the C-API. There the criterion is

int PySequence_Check(PyObject *o)

Return 1 if the object provides sequence protocol, and 0 otherwise. Note that it returns 1 for Python classes with a __getitem__() method unless they are dict subclasses since in general case it is impossible to determine what the type of keys it supports. This function always succeeds.

Finally, I don't follow this new (-ish) type annotation business too closely but I would imagine this also would benefit from a clear concept of what a sequence is.

So my question has both a philosophical and a practical side: What exactly is a sequence? and How do I test whether something is a sequence or not? Ideally, in a way that makes numpy arrays sequences. And if I ever start annotating, how would I approach sequences?

Paul Panzer
  • 51,835
  • 3
  • 54
  • 99
  • In `numpy` code we see errors like "setting an array element with a sequence". A list or tuple would raise that, but not a single element array. `arr[0] = [1]` gives the error; `arr[0]=np.array([1])` does not. `arr[0]=np.array([1,2])` does. – hpaulj Jul 18 '20 at 19:01
  • The practical answer is that a type matching Sequence requires active registration (via inheritance or Sequence.register) and definition of methods, and no one bothered to do that. – MisterMiyagi Jul 26 '20 at 08:48
  • 1
    Mostly duplicate of https://stackoverflow.com/questions/43566044/what-is-pythons-sequence-protocol – user2357112 Jul 26 '20 at 09:07
  • @user2357112supportsMonica I'm aware of that post and had in fact already referenced it. The annotation angle is new, though, and seems to me to add one more reason to seek clarity. Also,`.index` and `.count` are not mentioned in that thread, so I was wondering whether they are new and `collections.abc.Sequence` has inched two methods farther away from core/minimal sequence requirements. – Paul Panzer Jul 26 '20 at 09:35
  • For annotations, you're pretty much stuck using `typing.Sequence` or nothing. I recommend nothing, if you want to support NumPy arrays. The sequence-ness of a NumPy array isn't even fully determined by its type - 0-dimensional arrays don't support `len` or integer indices. – user2357112 Jul 26 '20 at 09:42
  • @user2357112supportsMonica 0D arrays are just an edge case we have to live with, not entirely unlike empty lists which also do not support integer indexing and are also generally quite good at punishing sloppy programming etc. – Paul Panzer Jul 26 '20 at 09:55
  • 2
    Not that it answers the core of the question, but, for practical purposes, if you want to use a type annotation specifically for sequences and NumPy arrays, you could also use [`Union`](https://docs.python.org/3/library/typing.html#typing.Union): `SequenceOrNdarray = Union[Sequence[Any], np.ndarray]`. – jdehesa Jul 30 '20 at 14:28

3 Answers3

11

Brief introduction to typing in Python

Skip ahead if you know what structural typing, nominal typing and duck typing are.

I think much of the confusion arises from the fact that typing was a provisional module between versions 3.5 and 3.6. And was still subject to change between versions 3.7 and 3.8. This means there has been a lot of flux in how Python has sought to deal with typing through type annotations.

It also doesn't help that python is both duck-typed and nominally typed. That is, when accessing an attribute of an object, Python is duck-typed. The object will only be checked to see if it has an attribute at runtime, and only when immediately requested. However, Python also has nominal typing features (eg. isinstance()and issubclass()). Nominal typing is where one type is declared to be a subclass of another. This can be through inheritance, or with the register() method of ABCMeta.

typing originally introduced its types using the idea of nominal typing. As of 3.8 it is trying to allow for the more pythonic structural typing. Structural typing is related to duck-typing, except that it is taken into consideration at "compile time" rather than runtime. For instance, when a linter is trying to detect possible type errors -- such as if you were to pass a dict to a function that only accepts sequences like tuples or list. With structural typing, a class B should be considered a subtype of A if it implements the all the methods of A, regardless of whether it has been declared to be a subtype of A (as in nominal typing).

Answer

sequences (little s) are a duck type. A sequence is any ordered collection of objects that provides random access to its members. Specifically, if it defines __len__ and __getitem__ and uses integer indices between 0 and n-1 then it is a sequence. A Sequence (big s) is a nominal type. That is, to be a Sequence, a class must be declared as such, either by inheriting from Sequence or being registered as a subclass.

A numpy array is a sequence, but it is not a Sequence as it is not registered as a subclass of Sequence. Nor should it be, as it does not implement the full interface promised by Sequence (things like count() and index() are missing).

It sounds like you want is a structured type for a sequence (small s). As of 3.8 this is possible by using protocols. Protocols define a set of methods which a class must implement to be considered a subclass of the protocol (a la structural typing).

from typing import Protocol
import numpy as np

class MySequence(Protocol):
    def __getitem__(self, index):
        raise NotImplementedError
    def __len__(self):
        raise NotImplementedError
    def __contains__(self, item):
        raise NotImplementedError
    def __iter__(self):
        raise NotImplementedError

def f(s: MySequence):
    for i in range(len(s)):
        print(s[i], end=' ')
    print('end')

f([1, 2, 3, 4]) # should be fine
arr: np.ndarray = np.arange(5)
f(arr) # also fine
f({}) # might be considered fine! Depends on your type checker

Protocols are fairly new, so not all IDEs/type checkers might support them yet. The IDE I use, PyCharm, does. It doesn't like f({}), but it is happy to consider a numpy array a Sequence (big S) though (perhaps not ideal). You can enable runtime checking of protocols by using the runtime_checkable decorator of typing. Be warned, all this does is individually check that each of the Protocols methods can be found on the given object/class. As a result, it can become quite expensive if your protocol has a lot of methods.

Dunes
  • 37,291
  • 7
  • 81
  • 97
  • 1
    Note that large-s sequence also is related to the *other* specification for small-s [Sequence Types](https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range). Which codify the structural equivalent to nominal, large-s Sequence. – MisterMiyagi Jul 28 '20 at 13:23
  • Related yes, but I think it's also presented as a convenience rather than a definitive definition. That is, it implements the shared set of functionality of the inbuilt sequence types, but is not presented as the defacto definition of a sequence. For instance, it doesn't implement `+` or `*` which are both listed as common operations implemented by sequences. My answer was to point out the laxness of the definition, and nudge towards a custom protocol where OP can define the spec which is required. – Dunes Jul 28 '20 at 13:42
  • Could you enlarge on duck vs structural, please? I don't understand the difference. What does compile vs run time mean here? – Paul Panzer Jul 31 '20 at 05:23
  • Runtime means while the program is actually running. Compile time, is stuff that is run before the runtime. For C-like languages, this is primarily the compiler. For languages like Python, this is primarily linters. With respect to Python, structural typing only really has any impact on linters. Before running your code, the linter will try to detect possible type errors. Using Protocols tells the linter that we only care about whether the class implements the required methods and not its strict nominal type. That is, you care that a class behaves like a sequence, and not its nominal type. – Dunes Jul 31 '20 at 12:35
  • Do you have a reference for the _"if it [...] uses integer indices between 0 and n-1 then it is a sequence"_ requirement? I wasn't able to find anything that actually specified that. – Eric Apr 06 '21 at 11:01
  • @Eric It's only required that `__getitem__` be implemented (and not be a subclass of dict) to be a sequence. But without the `0 <=i – Dunes Apr 08 '21 at 21:00
1

I think the most practical way to define a sequence in Python is 'A container that supports indexing with integers'.

The Wikipedia definition also holds:

a sequence is an enumerated collection of objects in which repetitions are allowed and order does matter.

To validate if an object is a sequence, I would emulate the logic from the Sequence Protocol:

hasattr(test_obj, "__getitem__") and not isinstance(test_obj, collections.abc.Mapping) 
Christiaan Herrewijn
  • 1,031
  • 10
  • 11
0

Per the doc you pasted:

The collections.abc.Sequence abstract base class defines a much richer interface that goes beyond just __getitem__() and __len__(), adding count(), index(), __contains__(), and __reversed__(). Types that implement this expanded interface can be registered explicitly using register().

numpy.ndarray does not implement the Sequence protocol because it does not implement count() or index():

>>> arr = numpy.arange(6)
>>> isinstance(arr, Sequence)
False
>>> arr.count(3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'numpy.ndarray' object has no attribute 'count'
>>> arr.index(3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'numpy.ndarray' object has no attribute 'index'

Contrast to a range:

>>> r = range(6)
>>> isinstance(r, Sequence)
True
>>> r.count(3)
1
>>> r.index(3)
3

If you want to claim that arr is a Sequence you can, by using the register() class method:

>>> Sequence.register(numpy.ndarray)
<class 'numpy.ndarray'>
>>> isinstance(arr, Sequence)
True

but this is a lie, because it doesn't actually implement the protocol (the register() function doesn't actually check for that, it just trusts you):

>>> arr.count(3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'numpy.ndarray' object has no attribute 'count'

so doing this may lead to errors if you pass a numpy.ndarray to a function that expects a Sequence.

Samwise
  • 68,105
  • 3
  • 30
  • 44
  • This doesn't really hold up. Most types registered with `collections.abc.Sequence` don't actually implement all the methods (especially `__reversed__`). Checking `issubclass(Whatever, collections.abc.Sequence)` really only checks if `Whatever` has `collections.abc.Sequence` in its MRO or if `Whatever` (or a superclass) is registered with `collections.abc.Sequence` (or a subclass). There is no interface check whatsoever. – user2357112 Jul 26 '20 at 09:07
  • 1
    Yes; I said this in my answer, but it's worth reiterating that **`Sequence.register` does not actually validate that you implement the protocol.** If a module registers a class with `Sequence` that doesn't actually implement it, such that instances of the class do not behave the way `Sequence`s are supposed to behave, then **this is a lie** and it should be considered a bug in that module IMO. – Samwise Jul 26 '20 at 15:15