1

So I have custom iterator which represents as a class CustomIterator below. The last print shows a size of all data which it uses when gets one char from some string. 1170 bytes.

class CustomIterator:
    def __init__(self, collection: str):
        self.__position = 0
        self.__collection = collection

    def __iter__(self):  
        return self

    def __next__(self):   
        while self.__position < len(self.__collection):  
            char = self.__collection[self.__position]    
            self.__position += 1                         
            return char.upper()                         
        raise StopIteration                            

iterator = CustomIterator(s)                            
print(f'{sys.getsizeof(CustomIterator) + sys.getsizeof(iterator) + sys.getsizeof(next(iterator))}') #1170

Also I have a generator which represents as a function with yield operator bellow. The last print here means the same as for iterator. 154 bytes.

#Generator
def generator(s: str):
    for char in s:
        yield char.upper()

g = generator(s)
print(f'{sys.getsizeof(generator(s)) + sys.getsizeof(next(g))}') #154

The both code makes the same results. So how exactly is working yield operator in Python? I supposed it inherit method next from base iterator and override it. Is it right? If so it must be the same resource requirements for both code?

I was trying to find the answer in Docs and some articles in google

It_is_Chris
  • 13,504
  • 2
  • 23
  • 41
Leo
  • 13
  • 1
  • 3
  • 1
    `while` that returns unconditionally should be `if`. It doesn't actually loop. – Barmar Apr 03 '23 at 15:06
  • And yes, the two approaches are equivalent. When you use `yield`, the state of the function is automatically saved. When you use the iterator class, you're saving the state explicitly in attributes. – Barmar Apr 03 '23 at 15:10
  • Yeah, I understand it's equivalent. But how yield save the position? For iterator I clearly know it, cause I can see attribute __position in my class. But yield don't show it, i don't get it. And the both codes will take same memory in ram? – Leo Apr 03 '23 at 15:16
  • It simply makes a snapshot of all the local variables of the function. AFAICT there's no way to view this state. – Barmar Apr 03 '23 at 15:22
  • So is it like hidden local variables with position of cursor? – Leo Apr 03 '23 at 15:28
  • Yes, that's exactly what it is. – Barmar Apr 03 '23 at 15:29
  • The execution state of a function is a dynamically allocated object. So a generator can simply be "suspended" similar to a coroutine. – Homer512 Apr 03 '23 at 15:30

1 Answers1

0

You are correct in several aspects, including when you say "If so it must be the same resource requirements for both code?" - yes, both forms will use basically the same resources, and whether one happens to be more performant or not is actually more due to implementation details than any fundamental thing.

For example, since pure Python iterators written as a class implementing __next__ require Python function calls, they are likely slower than using a generator-function with yield up to Python 3.10, but not necessarily on Python 3.11, where the overhead for function calling was reduced. OTOH, the pipy implementation should not have a difference between Python code written by the user in a __next__ method and the internal code executed by the runtime for generator functions.

In an ideal world, they would perform the same. And the "big O" algorithmic factors for both forms is certainly the same in any (reasonable) Python implementation.

This leaves us with the form differences: Indeed, when one write a function which includes the yield keyword in its body (even if it is in an unreachable section of the code), the function is, at compile time, created as a "generator function": this means that when it is called, none of the visible code in it is executed. Instead, Python create an instance of a "generator object" which is returned. This object features the methods __next__, send and throw, which can be subsequently used to drive the generator: in this sense a generator works the same as a user-implemented iterator.

As for output of sys.getsizeof this is certainly a thing that should not concern you. The output of this function is not a reliable metric, as it won't display the values of any referenced objects. An instance of a user class will typically have an associated full size dictionary, for example (although this has also been optimized in recent cPython releases). All in all, the difference for the total bytes used for a generator created by a generator function, and an iterator creatd by a user class, might be even of a couple hundred bytes in favor of the generator function one: but this won't make any difference in most workflows, unless one is creating hundreds (and for large server processes, tens of thousands) of generators to be used in parallel (i.e. creating new ones before older ones had been per-used and removed from memory).

And even them, the user class could be optimized (with the use of __slots__ and other techniques).

In your comparison, in particular:


print(f'{sys.getsizeof(CustomIterator) + sys.getsizeof(iterator) + sys.getsizeof(next(iterator))}') #1170

You are getting the size of the class object itself - it is an instane of type, and certainly will use some more memory (sys.getsizeof(CustomIterator) ) - the 1000 extra bytes you see are not much: this amount is created exactly once (*), and remains used for the lifetime of your process. Each iterator instance will use another amount of memory, which will be freed when the iterator is no longer used.

As for the internal state of a generator-function created generator, which seems to be the other thing that concerns you: it is of course not magic - it is maintained in an object that is even introspectable called an "execution frame". When you call a generator function, the returned "generator object" has the .gi_frame attribute, and you can inspect the internal local variables at .gi_frame.f_locals. The same state keeping takes place, in a nested way, when you run for char in s: . The difference there is that the for statement creates an iterator over s which is not directly accessible from Python code. But you could do: iter_s = iter(s); for char in iter_s, and see some of the state you want in the iter_s object (this won't expose internal states, like the variable used as a counter, in Python, however, but the __next__ method is there.)

(*) if you happen to put your "class" statement, with its body and all, inside a loop or a function, it will be executed again each time it is run, but that would be just incorrect code.

jsbueno
  • 99,910
  • 10
  • 151
  • 209