12

I'm using Python 2.7.3.

Consider a dummy class with custom (albeit bad) iteration and item-getting behavior:

class FooList(list):
    def __iter__(self):
        return iter(self)
    def next(self):
        return 3
    def __getitem__(self, idx):
        return 3

Make an example and see the weird behavior:

>>> zz = FooList([1,2,3])

>>> [x for x in zz]
# Hangs because of the self-reference in `__iter__`.

>>> zz[0]
3

>>> zz[1]
3

But now, let's make a function and then do argument unpacking on zz:

def add3(a, b, c):
    return a + b + c

>>> add3(*zz)
6
# I expected either 9 or for the interpreter to hang like the comprehension!

So, argument unpacking is somehow getting the item data from zz but not by either iterating over the object with its implemented iterator and also not by doing a poor man's iterator and calling __getitem__ for as many items as the object has.

So the question is: how does the syntax add3(*zz) acquire the data members of zz if not by these methods? Am I just missing one other common pattern for getting data members from a type like this?

My goal is to see if I could write a class that implements iteration or item-getting in such a way that it changes what the argument unpacking syntax means for that class. After trying the two example above, I'm now wondering how argument unpacking gets at the underlying data and whether the programmer can influence that behavior. Google for this only gave back a sea of results explaining the basic usage of the *args syntax.

I don't have a use case for needing to do this and I am not claiming it is a good idea. I just want to see how to do it for the sake of curiosity.

Added

Since the built-in types are treated specially, here's an example with object where I just maintain a list object and implement my own get and set behavior to emulate list.

class FooList(object):
    def __init__(self, lst):
        self.lst = lst
    def __iter__(self): raise ValueError
    def next(self): return 3
    def __getitem__(self, idx): return self.lst.__getitem__(idx)
    def __setitem__(self, idx, itm): self.lst.__setitem__(idx, itm)

In this case,

In [234]: zz = FooList([1,2,3])

In [235]: [x for x in zz]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-235-ad3bb7659c84> in <module>()
----> 1 [x for x in zz]

<ipython-input-233-dc9284300db1> in __iter__(self)
      2     def __init__(self, lst):
      3         self.lst = lst
----> 4     def __iter__(self): raise ValueError
      5     def next(self): return 3
      6     def __getitem__(self, idx): return self.lst.__getitem__(idx)

ValueError:

In [236]: add_3(*zz)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-236-f9bbfdc2de5c> in <module>()
----> 1 add_3(*zz)

<ipython-input-233-dc9284300db1> in __iter__(self)
      2     def __init__(self, lst):
      3         self.lst = lst
----> 4     def __iter__(self): raise ValueError
      5     def next(self): return 3
      6     def __getitem__(self, idx): return self.lst.__getitem__(idx)

ValueError:

But instead, if I ensure iteration stops and always returns 3, I can get what I was shooting to play around with in the first case:

class FooList(object):
    def __init__(self, lst):
        self.lst = lst
        self.iter_loc = -1
    def __iter__(self): return self
    def next(self): 
        if self.iter_loc < len(self.lst)-1:
            self.iter_loc += 1
            return 3
        else:
            self.iter_loc = -1
            raise StopIteration
    def __getitem__(self, idx): return self.lst.__getitem__(idx)
    def __setitem__(self, idx, itm): self.lst.__setitem__(idx, itm)

Then I see this, which is what I originally expected:

In [247]: zz = FooList([1,2,3])

In [248]: ix = iter(zz)

In [249]: ix.next()
Out[249]: 3

In [250]: ix.next()
Out[250]: 3

In [251]: ix.next()
Out[251]: 3

In [252]: ix.next()
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-252-29d4ae900c28> in <module>()
----> 1 ix.next()

<ipython-input-246-5479fdc9217b> in next(self)
     10         else:
     11             self.iter_loc = -1
---> 12             raise StopIteration
     13     def __getitem__(self, idx): return self.lst.__getitem__(idx)
     14     def __setitem__(self, idx, itm): self.lst.__setitem__(idx, itm)

StopIteration:

In [253]: ix = iter(zz)

In [254]: ix.next()
Out[254]: 3

In [255]: ix.next()
Out[255]: 3

In [256]: ix.next()
Out[256]: 3

In [257]: ix.next()
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-257-29d4ae900c28> in <module>()
----> 1 ix.next()

<ipython-input-246-5479fdc9217b> in next(self)
     10         else:
     11             self.iter_loc = -1
---> 12             raise StopIteration
     13     def __getitem__(self, idx): return self.lst.__getitem__(idx)
     14     def __setitem__(self, idx, itm): self.lst.__setitem__(idx, itm)

StopIteration:

In [258]: add_3(*zz)
Out[258]: 9

In [259]: zz[0]
Out[259]: 1

In [260]: zz[1]
Out[260]: 2

In [261]: zz[2]
Out[261]: 3

In [262]: [x for x in zz]
Out[262]: [3, 3, 3]

Summary

  1. The syntax *args relies on iteration only. For built-in types this happens in a way that is not directly overrideable in classes that inherit from the built-in type.

  2. These two are functionally equivalent:

    foo(*[x for x in args])

    foo(*args)

  3. These are not equivalent even for finite data structures.

    foo(*args)

    foo(*[args[i] for i in range(len(args))])

ely
  • 74,674
  • 34
  • 147
  • 228
  • 1
    I think you'll discover more interesting information if you derive `FooList` from `object` rather than `list`. – Robᵩ Oct 22 '13 at 19:15
  • 2
    This doesn't hang because of infinite iteration; it hangs because `__iter__(self)` indirectly calls itself recursively, so it takes infinite recursion to even _get_ an iterator. You can remove the `next` entirely and get the same behavior… – abarnert Oct 22 '13 at 19:15
  • I also created a separate `FooListIter` class and had `__iter__` return an instance of that, with the same `next` behavior as above, and it does the same. But thank you for the point about `iter` calling itself here. – ely Oct 22 '13 at 19:17
  • 1
    You could add some debug statements to all of your methods, like `print 'now executing next'` ... – Bas Swinckels Oct 22 '13 at 19:17
  • @EMS: You can change what argument-unpacking does for a class, but only by changing what iteration does for the class. That is, you can't make a class that does one thing for argument-unpacking and a different thing for other iteration. – BrenBarn Oct 22 '13 at 19:24
  • That was my suspicion, but it seems false for `list` (and other built-in types). Here, in fact, my `iter` implementation is totally wrong. If I ask for `iter(zz)` it hangs. But `*zz` still works, I guess because it ignores my implemented `__iter__` and uses `list`'s `__iter__` regardless? I agree if I inherited from `object` instead, and then referenced an internal data member that was a `list` it should hang. I was surprised it did not directly here. – ely Oct 22 '13 at 19:26
  • @EMS: Everything I'm saying only applies for user-defined types that don't inherit from builtin types (other than `object`). The builtin types have their own iteration/unpacking behavior which is basically functionally equivalent to the iterator protocol, but doesn't actually use the iterator protocol, so you can't reliably override or customize the behavior by subclassing builtin types and trying to define `__iter__` and stuff. – BrenBarn Oct 22 '13 at 19:53
  • Thanks @BrenBarn. Your most recent comment is precisely the answer I was looking for. – ely Oct 22 '13 at 19:54
  • Related: [Change what the *splat and **splatty-splat operators do to my object](https://stackoverflow.com/q/22365847/674039) – wim Sep 08 '17 at 04:23

1 Answers1

12

You have been bitten by one of Python's most irritating warts: builtin types and subclasses of them are treated magically in some places.

Since your type subclasses from list, Python magically reaches into its internals to unpack it. It doesn't use the real iterator API at all. If you insert print statements inside your next and __getitem__, you'll see that neither one is being called. This behavior cannot be overridden; instead, you would have to write your own class that reimplements the builtin types. You could try using UserList; I haven't checked whether that would work.

The answer to your question is that argument unpacking uses iteration. However, iteration itself can use __getitem__ if there is no explicit __iter__ defined. You can't make a class that defines argument-unpacking behavior that is different from the normal iteration behavior.

The iterator protocol (basically "how __iter__ works") shouldn't be assumed to apply to types that subclass builtin types like list. If you subclass a builtin, your subclass may magically behave like the underlying builtin in certain situations, without making use of your customize magic methods (like __iter__). If you want to customize behavior fully and reliably, you can't subclass from builtin types (except, of course, object).

BrenBarn
  • 242,874
  • 37
  • 412
  • 384
  • So for `list`, `tuple`, and `dict`, I cannot interrupt whatever internals are being called to get the data members during argument unpacking? – ely Oct 22 '13 at 19:18
  • 2
    @EMS: Why would you want to? If you don't want `list`/`tuple`/`dict` magic behavior, don't inherit from them; you can just keep a `list`/`tuple`/`dict` member and delegate to it (exactly where you want, and not where you don't want). – abarnert Oct 22 '13 at 19:19
  • I didn't say that I wanted to. – ely Oct 22 '13 at 19:19
  • @EMS: You could try using `UserList` or `UserDict`. – BrenBarn Oct 22 '13 at 19:20
  • @EMS: Then why did you ask that question? It's like asking "Can I monkeypatch methods on builtin types" and then saying you didn't want to monkeypatch methods on builtin types. – abarnert Oct 22 '13 at 19:21
  • 2
    That's exactly right. I want to know what I can do, not because I want to then do it but just to know what's possible. – ely Oct 22 '13 at 19:22
  • 1
    if __iter__ exists, but throws an exception, will python abort or back off to try using __getitem__ instead? – Corley Brigman Oct 22 '13 at 19:34
  • It looks like the unpacking syntax just relies on `iter` behavior. I'll add a proper `object` version to my question to demonstrate. – ely Oct 22 '13 at 19:38