The biggest reason is that it reduces type safety.
How many programs have you written where you actually needed to consume an iterable just to know how many elements it had, throwing away anything else?
I, in quite a few years of coding in Python, never needed that. It's a non-sensical operation in normal programs. An iterator may not have a length (e.g. infinite iterators or generators that expects inputs via send()
), so asking for it doesn't make much sense. The fact that len(an_iterator)
produces an error means that you can find bugs in your code. You can see that in a certain part of the program you are calling len
on the wrong thing, or maybe your function actually needs a sequence instead of an iterator as you expected.
Removing such errors would create a new class of bugs where people, calling len
, erroneously consume an iterator, or use an iterator as if it were a sequence without realizing.
If you really need to know the length of an iterator, what's wrong with len(list(iterator))
? The extra 6 characters? It's trivial to write your own version that works for iterators, but, as I said, 99% of the time this simply means that something with your code is wrong, because such an operation doesn't make much sense.
The second reason is that, with that change, you are violating two nice properties of len
that currently hold for all (known) containers:
It is known to be cheap on all containers ever implemented in Python (all built-ins, standard library, numpy
& scipy
and all other big third party libraries do this on both dynamically sized and static sized containers). So when you see len(something)
you know that the len
call is cheap. Making it work with iterators would mean that suddenly all programs might become inefficient due to computations of the length.
Also note that you can, trivially, implement O(1) __len__
on every container. The cost to pre-compute the length is often negligible, and generally worth paying.
The only exception would be if you implement immutable containers that have part of their internal representation shared with other instances (to save memory). However, I don't know of any implementation that does this, and most of the time you can achieve better than O(n) time anyway.
In summary: currently everybody implements __len__
in O(1) and it's easy to continue to do so. So there is an expectation for calls to len
to be O(1). Even if it's not part of the standard. Python developers intentionally avoid C/C++'s style legalese in their documentation and trust the users. In this case, if your __len__
isn't O(1), it's expected that you document that.
It is known to be not destructive. Any sensible implementation of __len__
doesn't change its argument. So you can be sure that len(x) == len(x)
, or that n = len(x);len(list(x)) == n
.
Even this property is not defined in the documentation, however it's expected by everyone, and currently, nobody violates it.
Such properties are good, because you can reason and make assumptions about code using them.
They can help you ensure the correctness of a piece of code, or understand its asymptotic complexity. The change you propose would make it much harder to look at some code and understand whether it's correct or what would be it's complexity, because you have to keep in mind the special cases.
In summary, the change you are proposing has one, really small, pro: saving few characters in very particular situations, but it has several, big, disadvantages which would impact a huge portion of existing code.
An other minor reason. If len
consumes iterators I'm sure that some people would start to abuse this for its side-effects (replacing the already ugly use of map
or list-comprehensions). Suddenly people can write code like:
len(print(something) for ... in ...)
to print text, which is really just ugly. It doesn't read well. Stateful code should be relagated to statements, since they provide a visual cue of side-effects.