Why CPython call len(iterable) when executing func(*iterable)?

Question

Recently I'm writing a download program, which uses the HTTP Range field to download many blocks at the same time. I wrote a Python class to represent the Range (the HTTP header's Range is a closed interval):

class ClosedRange:
    def __init__(self, begin, end):
        self.begin = begin
        self.end = end

    def __iter__(self):
        yield self.begin
        yield self.end

    def __str__(self):
        return '[{0.begin}, {0.end}]'.format(self)

    def __len__(self):
        return self.end - self.begin + 1

The __iter__ magic method is to support the tuple unpacking:

header = {'Range': 'bytes={}-{}'.format(*the_range)}

And len(the_range) is how many bytes in that Range.

Now I found that 'bytes={}-{}'.format(*the_range) occasionally causes the MemoryError. After some debugging I found that the CPython interpreter will try to call len(iterable) when executing func(*iterable), and (may) allocate memory based on the length. On my machine, when len(the_range) is greater than 1GB, the MemoryError appears.

This is a simplified one:

class C:
    def __iter__(self):
        yield 5

    def __len__(self):
        print('__len__ called')
        return 1024**3

def f(*args):
    return args

>>> c = C()
>>> f(*c)
__len__ called
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError
>>> # BTW, `list(the_range)` have the same problem.
>>> list(c)
__len__ called
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError

So my questions are:

Why CPython call len(iterable)? From this question I see you won't know an iterator's length until you iterate throw it. Is this an optimization?
Can __len__ method return the 'fake' length (i.e. not the real number of elements in memory) of an object?

https://www.python.org/dev/peps/pep-0424/. I would recommend not doing this (having an `__iter__` that doesn’t yield the elements of a range) anyway. — Ry-, Nov 01 '17 at 05:09
If your iterator yields two items, then the length of your sequence is two. — jasonharper, Nov 01 '17 at 05:22
@juanpa.arrivillaga @Ryan I know it. `list(it)` or `f(*it)` both create a sequence, and will call `operator.length_hint(it)` to pre-allocate space. And `operator.length_hint` see that `it` has `__len__` method, so just returns `len(it)` -- the sequence so allocates too large. Is that right? — CSM, Nov 01 '17 at 05:23

score 2 · Accepted Answer · answered Nov 01 '17 at 19:04

Why CPython call len(iterable)? From this question I see you won't know an iterator's length until you iterate throw it. Is this an optimization?

when python (assuming python3) execute f(*c), opcode CALL_FUNCTION_EX is used:

 0 LOAD_GLOBAL              0 (f)
 2 LOAD_GLOBAL              1 (c)
 4 CALL_FUNCTION_EX         0
 6 POP_TOP

as c is an iterable, PySequence_Tuple is called to convert it to a tuple, then PyObject_LengthHint is called to determine the new tuple length, as __len__ method is defined on c, it gets called and its return value is used to allocate memory for a new tuple, as malloc failed, finally MemoryError error gets raised.

/* Guess result size and allocate space. */
n = PyObject_LengthHint(v, 10);
if (n == -1)
    goto Fail;
result = PyTuple_New(n);

Can __len__ method return the 'fake' length (i.e. not the real number of elements in memory) of an object?

in this scenario, yes.

when the return value of __len__ is smaller than need, python will adjust memory space of new tuple object to fit when filling the tuple. if it is larger than need, although python will allocate extra memory, _PyTuple_Resize will be called in the end to reclaim over-allocated space.

Why CPython call len(iterable) when executing func(*iterable)?

1 Answers1