6

In python 2, the built in function map seems to call the __len__ when length is overwritten. Is that correct - if so, why are we computing the length of the iterable to map? Iterables don't need to have length overwritten (e.g.), and the map function works even when length is not pre-defined by the iterable.

Map is defined here; it does specify that there is length-dependent functionality in the event that multiple iterables are passed. However,

  • I'm interested in the case that only one iterable is passed
  • Even if multiple iterables were passed (not my question), it seems like an odd design choice to explicitely check the length, instead of just iterating until you run out and then returning None

I am concerned because according to several 1 2 extremely highly upvoted questions,

map(f, iterable)

is basically equivalent to:

[f(x) for x in iterable]

But I am running into simple examples where that isn't true.

For Example

class Iterable:

    def __iter__(self):
        self.iterable = [1,2,3,4,5].__iter__()
        return self

    def next(self):
        return self.iterable.next()

   #def __len__(self):
   #     self.iterable = None
   #    return 5


def foo(x): return x

print( [foo(x) for x in Iterable()] )
print( map(foo,Iterable()) )

Behaves as it should, but if you uncomment the overloading of len, it very much does not.

In this case, it raises an AttributeError because the iterable is None. While the unit behaviour is silly, I see no requirement of invariance in the specification of len. Surely, it's good practice to not modify the state in a call to len, but the reason should not be because of unexpectable behaviour in builtin functions. In more realistic cases, my len function may just be slow, and I don't expect to worry about it being called by map, or maybe it isn't thread safe, etc..


Implementation Dependent?

Since map is a builtin function, it may have implementation-specific features outside the spec, but cpython implements it on line 918 of bltinmodule.c, which indeed states:

/* Do a first pass to obtain iterators for the arguments, and set len
 * to the largest of their lengths.
 */

And then calls _PyObject_LengthHint, which is defined in Object/abstract.c, and indeed seems to look for an overwritten len. This doesn't clarify to me whether this is just implementation dependent, or if I'm missing some reason that map purposefully looks for the iterable's length against my instinct.


(Note I haven't tested this in python 3, that is why I specified python 2. In python3, map returns a generator, so at least a few of my claims aren't true)

Community
  • 1
  • 1
en_Knight
  • 5,301
  • 2
  • 26
  • 46
  • Its an implementation detail but I think we should be looking at the definition of `__len__`, not `map`. The [2.7 doc](https://docs.python.org/2/reference/datamodel.html#object.__len__) says _"Should return the length of the object, an integer >= 0."_. Its a "should" not a "must" but doesn't good object design include making sure calling `len()` is safe? – tdelaney Dec 02 '15 at 19:39

2 Answers2

2

I am concerned because according to several 1 2 extremely highly upvoted questions,

map(f, iterable)

is basically equivalent to:

[f(x) for x in iterable]

But I am running into simple examples where that isn't true.

But calling _PyObject_LengthHint is supposed to be basically equivalent to not calling it. An object's __len__ or __length_hint__ is not supposed to mutate the object like this. You might as well say that map(f, iterable) and [f(x) for x in iterable] are inequivalent because if f uses stack inspection to determine whether it's being called from map and does something different, the two snippets behave differently.

As for why map does this, it's trying preallocate the list to the right size to avoid needing to resize the list. Resizes only slow things down by a constant factor, but if you can avoid the constant factor, why not? It would be perfectly reasonable for list comprehensions to do this in a future Python version.

Community
  • 1
  • 1
user2357112
  • 260,549
  • 28
  • 431
  • 505
  • The last paragraph seems solid to me, but I'm not sold on your analogy in the first paragraph; if I explicitly undermine the procedure then sure it won't work, but why would I expect python to call a custom function from a built-in one, without specifying it? "But calling _PyObject_LengthHint is supposed to be basically equivalent to not calling it" what does this mean? Calling len is not guaranteed to be constant time for an arbitrary iterable, right, so the pre-allocation is only saving time if it doesn't wind up walking through the iterable an extra time. What am I missing? – en_Knight Dec 02 '15 at 19:25
  • @en_Knight: "Calling len is not guaranteed to be constant time for an arbitrary iterable, right" - not guaranteed, but it's very strongly expected to be constant time. The constant-factor speedup in the common case is worth the constant-factor slowdown in the highly uncommon case. "why would I expect python to call a custom function from a built-in one, without specifying it" - `map` just says it's going to call the function on the elements of the iterable and return a list of results. It doesn't say anything about how it interacts with the iterable... – user2357112 Dec 02 '15 at 19:44
  • Heck, it could call `keys` if it detects that the iterable is a mapping, and it actually does call `__getitem__` if the argument has no `__iter__`. – user2357112 Dec 02 '15 at 19:51
  • okay, what I'm getting from this is that my use of len was flagrant enough that the side-effect was my fault... It's unsatisfying for slow length calcs but that's fine. The call to getitem is a good point, though I'd point out that it's well documented that getitem is called to define itearbles when next isn't avaliable. (I'll accept this if nothing better comes along) – en_Knight Dec 02 '15 at 19:55
0

I'm not quite sure of what you're asking here. I'm going to assume that your question is "Why is the result of map(f, iterable) not always equivalent to [f(x) for x in iterable] ?"

From your research, it's clear that there is degree of implementation dependence in the builtin map function, which (while it does come off a bit strange) makes total sense for a custom implementation of an iterable object.

The point of the specification of the __len__ being a bit too lax is a good one though. It seems that modification of object state should be labeled as very bad in a method like this.

However, it does seem like your interpretation of the equivalence between map(f, iterable) and [f(x) for x in iterable] might be making incorrect assumptions. This is true, of course, in cases where the implementation of f, map and iterable do not modify the underlying mechanisms of the evaluation. When they do, all bets are off. Basically, context is important. Take this, for instance:

def map(function, iterable):
    return None

def some_fun(x):
    return x + 1

a = [1,2,3]

>>> map(some_fun, a)
None
>>> [some_fun(x) for x in a]
[2, 3, 4]

Here, obviously, the map function result and the list comprehension are not the same thing (if you only look at the last 2 evals). Heck, they do completely different things. This is completely due to context. Therefore, unless clearly stated, in most cases it is completely reasonable to assume that both map and iterable's implementation do not circumvent the mechanisms in Python. However, if there is context, it comes first.

Juan Carlos Coto
  • 11,900
  • 22
  • 62
  • 102
  • This generally seems like a good answer to me, though your example seems misfitting. In your example, the builtin map function is not doing something unexpected - it isn't being called. Indeed, list comprehension isn't equivalent to all objects named map, just the builtin one. My issue is in "explicit is better than implicit"; "iterable's implementation does not circumvent the mechanisms of python" - I was surprised that the "mechanisms of python" were doing things so subtly when it seemed unexpected. "It's not unexpected," as you and user235 seem to agree, is a fine explanation – en_Knight Dec 02 '15 at 20:23