Subclassing and overriding a generator function in python

Question

I need to override a method of a parent class, which is a generator, and am wondering the correct way to do this. Is there anything wrong with the following, or a more efficient way?

class A:
    def gen(self):
        yield 1
        yield 2

class B(A):
    def gen(self):
        yield 3
        for n in super().gen():
            yield n

score 14 · Answer 1 · answered Dec 23 '15 at 21:04

14

For Python 3.3 and up, the best, most general way to do this is:

class A:
    def gen(self):
        yield 1
        yield 2

class B(A):
    def gen(self):
        yield 3
        yield from super().gen()

This uses the new yield from syntax for delegating to a subgenerator. It's better than the other solutions because it's actually handing control to the generator it delegates to; if said generator supports .send and .throw to pass values and exceptions into the generator, then delegation means it actually receives the values; explicitly looping and yielding one by one will receive the values in the gen wrapper, not the generator actually producing the values, and the same problem applies to other solutions like using itertools.chain.

answered Dec 23 '15 at 21:04

ShadowRanger

143,180
12
188
271

This seems incomplete: if you append `return 4` to the definition of `A.gen`, then applying `next()` three times on `A().gen()` raises `StopIteration(4)`, while applying `next()` four times on `B().gen()` raises `StopIteration()`; i.e., the return value is lost. `itertools.chain` has the same problem. – Erik Carstensen May 25 '22 at 13:24
@ErikCarstensen: `StopIteration` *allows* values to be provided, but no part of Python really treats them as important. They *can't* propagate `A.gen`'s original `StopIteration` in any event, because `yield from` is not necessarily the last thing done in `B.gen`. It's *extremely* rare to rely on the return value of a generator in any event; if you do, delegation just won't work without slow, ugly hacks (manually `next`ing over and over, just so you can catch and optionally rethrow the `StopIteration`). – ShadowRanger May 25 '22 at 13:37
`StopIteration` values are commonly used in the context of `asyncio`, because `asyncio.Future.__await__` is a generator that returns the future's result. I got here because I needed to override it today; with a bare `yield from` in my override, `print(await my_future)` in a coroutine would always print `None`. – Erik Carstensen May 25 '22 at 14:04
@ErikCarstensen: If you only have *one* thing, you just return the underlying generator (or `return` the result of `await`ing it), rather than use `yield from`. `yield from` is for when you need to generate multiple items from different sources; if you just have one lazily produced value, return the raw awaitable or `await` it yourself. – ShadowRanger May 25 '22 at 14:22
I don't quite understand your suggestion. `__await__` yields zero or one time, and then returns a value. I want to override it so that it with an implementation that does side-effects during iteration. I don't know how to do that with `await`. – Erik Carstensen May 25 '22 at 20:22
Since `__await__` is a generator that both yields and returns, I needed a way of overriding it that covers this. Your answer does not and is hence not the most general way to do it. I wrote an answer [below](https://stackoverflow.com/a/72378951/4869375); perhaps this is a too obscure corner to deserve that level of emphasis, but maybe return values would deserve a footnote mention in your answer? – Erik Carstensen May 25 '22 at 20:27

score 8 · Accepted Answer · answered Nov 10 '11 at 11:17

8

What you have looks fine, but is not the only approach. What's important about a generator function is that it returns an iterable object. Your subclass could thus instead directly create an iterable, for example:

import itertools

class B(A):
    def gen(self):
        return itertools.chain([3], super().gen())

The better approach is going to depend on exactly what you're doing; the above looks needlessly complex, but I wouldn't want to generalize from such a simple example.

answered Nov 10 '11 at 11:17

Michael J. Barber

24,518
9
68
88

1

@J.Barber +1 Your answer is better than mine: use of **super()** is lighter. But the correct syntax is ``return itertools.chain([3], super(B,self).gen())`` . And moreover **super()** only works for new-style classes; that means that class **A** must be defined with ``class A(object)`` – eyquem Nov 10 '11 at 14:51
3

@eyquem The syntax used in the question, and my answer, is for Python 3. There are no classic classes in Python 3, so `class A` is equivalent to `class A(object)`. – Michael J. Barber Nov 10 '11 at 15:08
1

@eyquem: In Python 3, `super()` is perfectly fine (recommended in fact). Which also did away with old-style classes. And since OP used `super()` himself, I'd say it's safe to assume 3.x. – Nov 10 '11 at 15:08
@Barber Thank you for these precisions. I don't think enough to the differences between Python 3 and Python 2. There was indeed a tag that signals it is in Python 3, but I didn't remarked it. The syntax now present in the question isn't the one that was used initially used. There was ``for n in A.gen():`` – eyquem Nov 10 '11 at 16:12
@delnan Now there's ``super().gen()`` in the question but it's because it has been edited. I wouldn't have run into such a long answer if I hadn't read ``A.gen()`` – eyquem Nov 10 '11 at 16:16
itertools.chain() is helpful, thank you. I think I will have to use the for loop way, because I need to do other things in the method rather than returning a generator straight away, but I'll probably find another use for chain. – ejm Nov 12 '11 at 12:15

score 3 · Answer 3 · edited May 23 '17 at 11:45

3

To call a method from a subclass you need the keyword super.

New Source Code:

class B(A):
    def gen(self):
        yield 3
        for n in super().gen():
            yield n

This:

b = B()
for i in b.gen():
     print(i)

produces the output:

   3
   1
   2

In the first Iteration your generator stops at '3', for the following iterations it just goes on as the superclass normally would.

This Question provides a really good and lengthy explanation of generators, iterators and the yield- keyword: What does the "yield" keyword do in Python?

edited May 23 '17 at 11:45

Community

1
1

answered Nov 10 '11 at 10:27

Ria

206
1
9

Thanks for your reply. My question is about the use of yield in the second class, not the use of super(), but I've updated to use better code. – ejm Nov 10 '11 at 10:54
I hope it's a bit more helpful. – Ria Nov 10 '11 at 11:09
2

@Ria The correct syntax is ``return itertools.chain([3], super(B,self).gen())`` . And moreover **super()** only works for new-style classes; that means that class **A** must be defined with ``class A(object)`` . +1 anyway for super() – eyquem Nov 10 '11 at 14:48

score 0 · Answer 4 · answered May 25 '22 at 13:53

If A.gen() also may contain a return statement, then you also need to make sure your override returns with a value. This is easiest done as follows:

class A:
    def gen(self):
        yield 1
        return 2

class B:
    def gen(self):
        yield 3
        ret = yield from super().gen()
        return ret

This gives:

>>> i = A.gen()
>>> next(i)
1
>>> next(i)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration: 2
>>> i = B.gen()
>>> next(i)
3
>>> next(i)
1
>>> next(i)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration: 2

Without an explicit return statement, the last line is StopIteration instaed of StopIteration: 2.

eyquem · Answer 5 · 2011-11-10T15:00:54.753

Your code is correct.
Or rather, I don't see problem in it and it apparently runs correctly.

The only thing I can think of is the following one.

.

Post-scriptum

For new-style classes, see other answers that use super()
But super() only works for new-style classes Anyway, this answer could be useful at least, but only, for classic-style classes.

.

When the interpreter arrives on the instruction for n in A.gen(self):, it must find the function A.gen.

The notation A.gen doesn't mean that the object A.gen is INSIDE the object A.
The object A.gen is SOMEWHERE in the memory and the interpreter will know where to find it by obtaining the needed information (an address) from A.__dict__['gen'] , in which A.__dict__ is the namespace of A.
So, finding the function object A.gen in the memory requires a lookup in A.__dict__

But to perform this lookup, the interpreter must first find the object A itself.
So, when it arrives on the instruction for n in A.gen(self): , it first searches if the identifier A is among the local identifiers, that is to say it searches for the string 'A' in the local namespace of the function (of which I don't know the name).
Since it is not, the interpreter goes outside the function and searches for this identifier at the module level, in the global namespace (which is globals() )

At this point, it may be that the global namespace would have hundreds or thousands of attributes names among which to perform the lookup for 'A'.

However, A has very few attributes: its __dict__ 's keys are only '_ module _' , 'gen' and '_ doc _' (to see that, make print A.__dict__ )
So, it would be a pity that the little search for the string 'gen' in A._dict_ should be done after a search among hundreds of items in the dictionary-namespace globals() of the module level.

.

That's why I suggest another way to make the interpreter able to find the function A.gen

class A:
    def gen(self):
        yield 1
        yield 2

class BB(A):
    def gen(self):
        yield 3
        for n in self.__class__.__bases__[0].gen(self):
            yield n


bb = BB()
print list(bb.gen())  # prints [3, 1, 2]

self._class_ is the class from which has been instanciated the instance, that is to say it is Bu

self._class_._bases_ is a tuple containing the base classes of Bu
Presently there is only one element in this tuple , so self._class_._bases_[0] is A

__class__ and __bases__ are names of special attributes that aren't listed in _dict_ ;
In fact _class_ , _bases_ and _dict_ are special attributes of similar nature, they are Python-provided attributes, see:
http://www.cafepy.com/article/python_attributes_and_methods/python_attributes_and_methods.html

.

Well, what I mean , in the end, is that there are few elements in self._class_ and in self._class_._bases_ , so it is rational to think that the successive lookups in these objects to finally find the way to access to A.gen will be faster than the lookup to search for 'gen' in the global namespace in case this one contains hundreds of elements.

Maybe that's trying to do too much optimization, maybe not.
This answer is mainly to give information on the underlying implied mechanisms, that I personally find interesting to know.

.

Edit

You can obtain the same as your code with a more concise instruction

class A:
    def gen(self):
        yield 1
        yield 2

class Bu(A):
    def gen(self):
        yield 3
        for n in A.gen(self):
            yield n

b = Bu()
print 'list(b.gen()) ==',list(b.gen())

from itertools import chain
w = chain(iter((3,)),xrange(1,3))
print 'list(w)       ==',list(w)

produces

list(b.gen()) == [3, 1, 2]
list(w)       == [3, 1, 2]

Thanks for the detailed information about use of super and optimizing to find the parent class method. Will certainly keep that in mind. — ejm, Nov 12 '11 at 12:06
There are some misconceptions when you talk about the performance of `A.gen(self)` vs. `self.__class__.__bases__[0].gen(self)`. 1) Identifying `A` as being non-local is done when the method is compiled; the locals are never searched when the method is called (it checks globals, then builtins if globals don't have it). 2) The fact that `globals()` might contain hundreds of entries is largely irrelevant; `dict` lookup is average case `O(1)`; the size of the `dict` doesn't meaningfully impact lookup speed. 3) There's some crazy stuff involved in attribute lookup too (cont.) — ShadowRanger, Feb 17 '22 at 20:13
(it checks the class namespace first, then the instance namespace, both of which involve `dict` lookups, where looking up a global only costs one `dict` lookup, and as noted, the expense of a `dict` lookup is largely unrelated to its size, so the global lookup is cheaper). As a result, `self.__class__.__bases__[0].gen(self)` involves three attribute lookups (3-6 `dict` lookups depending on some internal optimizations) and one indexing operation (more expensive than you might think), where `A.gen(self)` only involves just two `dict` lookups total (one from globals, one from attribute lookup). — ShadowRanger, Feb 17 '22 at 20:18
In short, **the "optimization" you suggest is actually a significant pessimization**; the only advantage it has is a stylistic one (avoiding naming the parent class explicitly anywhere but in the `class B(A):` declaration, helpful for refactoring), ***and* it comes with a major problem:** If you make a `class C(B):`, you'll end up with an infinite loop (`self.__class__.__bases__[0]` when `self` is an instance of `C` will always return `B`, so you'll recursively invoke `B.gen` until the program dies with a `RecursionError`). — ShadowRanger, Feb 17 '22 at 20:22
That major problem is what `super()` exists to address; by knowing the class the method was defined in (implicitly through some ugly internal hacks on Py3, explicitly through the user providing it on Py2 new-style classes), it can skip itself and all overrides earlier in the chain even when called with subclass instances (`super().methodname` proxies to the first `methodname` in the C3 linearized MRO of `self`'s class that comes *after* `B` in the MRO, guaranteeing no recursive invocation if all layers use `super` correctly, which is pretty easy to do in Py3 with no-arg `super()`). — ShadowRanger, Feb 17 '22 at 20:30

Subclassing and overriding a generator function in python

5 Answers5

Post-scriptum

Edit