You are correct in several aspects, including when you say "If so it must be the same resource requirements for both code?" - yes, both forms will use basically the same resources, and whether one happens to be more performant or not is actually more due to implementation details than any fundamental thing.
For example, since pure Python iterators written as a class implementing __next__
require Python function calls, they are likely slower than using a generator-function with yield
up to Python 3.10, but not necessarily on Python 3.11, where the overhead for function calling was reduced. OTOH, the pipy implementation should not have a difference between Python code written by the user in a __next__
method and the internal code executed by the runtime for generator functions.
In an ideal world, they would perform the same. And the "big O" algorithmic factors for both forms is certainly the same in any (reasonable) Python implementation.
This leaves us with the form differences:
Indeed, when one write a function which includes the yield
keyword in its body (even if it is in an unreachable section of the code), the function is, at compile time, created as a "generator function": this means that when it is called, none of the visible code in it is executed. Instead, Python create an instance of a "generator object" which is returned. This object features the methods __next__
, send
and throw
, which can be subsequently used to drive the generator:
in this sense a generator works the same as a user-implemented iterator.
As for output of sys.getsizeof
this is certainly a thing that should not concern you. The output of this function is not a reliable metric, as it won't display the values of any referenced objects. An instance of a user class will typically have an associated full size dictionary, for example (although this has also been optimized in recent cPython releases). All in all, the difference for the total bytes used for a generator created by a generator function, and an iterator creatd by a user class, might be even of a couple hundred bytes in favor of the generator function one: but this won't make any difference in most workflows, unless one is creating hundreds (and for large server processes, tens of thousands) of generators to be used in parallel (i.e. creating new ones before older ones had been per-used and removed from memory).
And even them, the user class could be optimized (with the use of __slots__
and other techniques).
In your comparison, in particular:
print(f'{sys.getsizeof(CustomIterator) + sys.getsizeof(iterator) + sys.getsizeof(next(iterator))}') #1170
You are getting the size of the class object itself - it is an instane of type
, and certainly will use some more memory (sys.getsizeof(CustomIterator)
) - the 1000 extra bytes you see are not much: this amount is created exactly once (*), and remains used for the lifetime of your process. Each iterator instance will use another amount of memory, which will be freed when the iterator is no longer used.
As for the internal state of a generator-function created generator, which seems to be the other thing that concerns you: it is of course not magic - it is maintained in an object that is even introspectable called an "execution frame". When you call a generator function, the returned "generator object"
has the .gi_frame
attribute, and you can inspect the internal local variables at .gi_frame.f_locals
. The same state keeping takes place, in a nested way, when you run for char in s:
. The difference there is that the for
statement creates an iterator over s
which is not directly accessible from Python code. But you could do: iter_s = iter(s); for char in iter_s
, and see some of the state you want in the iter_s
object (this won't expose internal states, like the variable used as a counter, in Python, however, but the __next__
method is there.)
(*) if you happen to put your "class" statement, with its body and all, inside a loop or a function, it will be executed again each time it is run, but that would be just incorrect code.