Recently I'm writing a download program, which uses the HTTP Range field to download many blocks at the same time. I wrote a Python class to represent the Range (the HTTP header's Range is a closed interval):
class ClosedRange:
def __init__(self, begin, end):
self.begin = begin
self.end = end
def __iter__(self):
yield self.begin
yield self.end
def __str__(self):
return '[{0.begin}, {0.end}]'.format(self)
def __len__(self):
return self.end - self.begin + 1
The __iter__
magic method is to support the tuple unpacking:
header = {'Range': 'bytes={}-{}'.format(*the_range)}
And len(the_range)
is how many bytes in that Range.
Now I found that 'bytes={}-{}'.format(*the_range)
occasionally causes the MemoryError
. After some debugging I found that the CPython interpreter will try to call len(iterable)
when executing func(*iterable)
, and (may) allocate memory based on the length. On my machine, when len(the_range)
is greater than 1GB, the MemoryError
appears.
This is a simplified one:
class C:
def __iter__(self):
yield 5
def __len__(self):
print('__len__ called')
return 1024**3
def f(*args):
return args
>>> c = C()
>>> f(*c)
__len__ called
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
MemoryError
>>> # BTW, `list(the_range)` have the same problem.
>>> list(c)
__len__ called
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
MemoryError
So my questions are:
Why CPython call
len(iterable)
? From this question I see you won't know an iterator's length until you iterate throw it. Is this an optimization?Can
__len__
method return the 'fake' length (i.e. not the real number of elements in memory) of an object?