9

I am trying to get my head around generators and learn about how to use them. I have seen a lot of examples and get that they produce results one at a time rather than outputting them at once like in a regular function. But all the examples I have seen have involved going through a list and printing values that are generated through the function. What if you want to actually create a list?

For example I have seen an example about even numbers which just generates even numbers and prints them out, but what if I want a list of even numbers like this:

def even(k):
    for i in range(k):
        if (i%2):
           yield k

even_list = []
for i in even(100):
    even_list.append(i)

Does this defeat the purpose of using a generator as it then creates this in an even list. Does this method still save some memory/time?

Or is the below method without using generators just as efficient.

def even(k):
    evens_list = []
    for i in range(k):
        if (i%2):
           evens_list.append(i)
    return evens_list

In this case in what exact cases are generators useful?

jan93
  • 179
  • 1
  • 2
  • 10
  • 1
    Typically to save memory (not all elements need to exists at the same time), or time (if you think frequently only the first k elements are necessary), or for things that do not end (like the stream of a webcam). – Willem Van Onsem Sep 22 '18 at 15:30
  • @WillemVanOnsem I think the question is more geared towards whether `for i in even(100):` generates the whole sequence in one go anyway – roganjosh Sep 22 '18 at 15:32
  • There are some occasions when constructing a function is much simpler/cleaner via yield than needing to construct the result in the function. – Stephen Rauch Sep 22 '18 at 15:33
  • @roganjosh: the first two points actually apply to `for i in even(100)` as well. – Willem Van Onsem Sep 22 '18 at 15:42

3 Answers3

12

Does this defeat the purpose of using a generator as it then creates this in an even list. In this case in what exact cases are generators useful?

This is a bit opinion based, but there are some situations where a list might not do the trick (for example because of hardware limitations).

Saving CPU cycles (time)

Imagine that you have a list of even numbers, and then want to take the sum of the first five numbers. In Python we could do that with an islice, like:

sumfirst5even = sum(islice(even(100), 5))

If we would first generate a list of 100 even numbers (not knowing what we will later do with that list), then we have spent a lot of CPU cycles in the construction of such list, that are wasted.

By using a generator, we can restrict this to only the elements we really need. So we will only yield the first five elements. The algorithm will never calculate elements larger than 10. Yes, here it is doubful that this will have any (significant) impact. It is even possible that the "generator protocol" will require more CPU cycles compared to generating a list, so for small lists, there is no advantage. But now imagine that we used even(100000), then the amount of "useless CPU cycles" we spent on generating an entire list, can be significantly.

Saving memory

Another potential benefit is saving memory, given we do not need all elements of the generator in memory concurrently.

Take for example the following example:

for x in even(1000):
    print(x)

If even(..) constructs a list of 1000 elements, then that means that all these numbers need to be objects in memory concurrently. Depending on the Python interpreter, objects can take significant amount(s) of memory. For example an int takes in CPython, 28 bytes of memory. So that means that a list containing 500 such ints can take roughly 14 kB of memory (some extra memory for the list). Yes most Python interpreters maintain a "flyweight" pattern to reduce the burden of small ints (these are shared, and so we do not create a separate object for each int we construct in the process), but still it can easily add up. For an even(1000000), we will need 14 MB of memory.

If we use a generator, than depending on how we use the generator, we might save memory. Why? Because once we no longer need the number 123456 (since the for loop advances to the next item), the space that object "occupied" can be recycled, and given to an int object with value 12348. So it means that - given the way we use the generator permits this - that the memory usage remains constant, whereas for a list it scales linear. Of course the generator itself also needs to do proper management: if in the generator code, we build up a collection, than the memory will of course increase as well.

In 32-bit systems, this can even result in some problems since Python lists have a maximum length. A list can contain at most 536'870'912 elements. Yes that is a huge number, but what if you for example want to generate all permutations of a given list? If we store the permutations in a list, then that means that for a 32-bit system, a list of 13 (or more elements), we will never be able to construct such a list.

"online" programs

In theoretical computer science, an "online algorithm" is by some researchers defined as an algorithm that receives input gradually, and thus does not knows the entire input in advance.

A practical example can be a webcam, that each second makes an image, and sends it to a Python webserver. We do not know at that moment how a picture that will be captured by the webcam within 24 hours will look like. But we might be interested in detecting a burglar that aims to steal something. In that case a list of frames will thus not contain all images. A generator can however construct an elegant "protocol" where we iteratively fetch an image, detect a burglar, and raise an alarm, like:

for frame in from_webcam():
    if contains_burglar(frame):
        send_alarm_email('Maurice Moss')

Infinite generators

We do not need webcams or other hardware to exploit the elegance of generators. Generators can yield an "infinite" sequence. Or even generator could for example look like:

def even():
    i = 0
    while True:
        yield i
        i += 2

This is a generator that will eventually generate all even numbers. If we keep iterating over it, eventually we will yield the number 123'456'789'012'345'678 (although it might take a very long time).

The above can be useful if we want to implement a program that for example keeps yielding even numbers that are palindromes. This could look like:

for i in even():
    if is_palindrome(i):
        print(i)

We thus can assume that this program will keep working, and do not need to "update" the list of even numbers. In some pure functional languages that make lazy programming transparent, programs are written as if you create a list, but in fact it is typically a generator in place.

"enriched" generators: range(..) and friends

In Python a lot of classes do not construct lists when you iterate over them, for example a range(1000) object does not first construct a list (it does in , but not in ). The range(..) object simply represents a range. A range(..) object is not a generator, but it is a class that can generate an iterator object, that works like a generator.

Besides iterating, we can do all kinds of things with a range(..) object, that is possible with lists, but not in an efficient way.

If we for example want to know whether 1000000000 is an element of range(400, 10000000000, 2), then we can write 1000000000 in range(400, 10000000000, 2). Now there is an algorithm in place that will check this without generating the range, or constructing a list: it sees if the elements is an int, is in the range of the range(..) object (so greater than or equal to 400, and less than 10000000000), and whether it is yielded (taking the step into account), this does not require iterating over it. As a result the membership check can be done instantly.

If we had generated a list, this would mean that Python had to enumerate over every element until it finally can find that element (or reaches the end of the list). For numbers like 1000000000, this can easily take minutes, hours, maybe days.

We can also "slice" the range object, which yield another range(..) object, for example:

>>> range(123, 456, 7)[1::4]
range(130, 459, 28)

with an algorithm we can thus instantly slice the range(..) object into a new range object. Slicing a list takes linear time. This can again (for huge lists) take significant time and memory.

Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
4

Generators are shorter and more readable:

In your example, you have to create an empty list, use append and return the resulting list:

def even(k):
    evens_list = []
    for i in range(k):
        if i % 2 != 0:
           evens_list.append(i)
    return evens_list

The generator just needs yield:

def even(k):
    for i in range(k):
        if i % 2 != 0:
           yield i

And the use is nearly the same, if you need really a list. Instead of

event_list = even(100)

the line becomes

event_list = list(even(100))
Daniel
  • 42,087
  • 4
  • 55
  • 81
3

The generator but in general a lazy semantic offer some advantages:

  • You can create infinite list
  • You can save a lot of memory because it doesn't keep in memory all the list
  • Is often used for expensive IO operations so you can effectively retrieve data only when you really use it

But also some drawbacks:

  • Overhead
    • You need to keep in memory the variables of the generator function
    • also risk of memory leak
  • Every time you want to reuse the elements in a collection it must be regenerated
Raymond Reddington
  • 1,709
  • 1
  • 13
  • 21
Mikedev
  • 316
  • 1
  • 8
  • In my example above it does keep the list of evens in memory. So I guess my question was is the generator method any better than the original method above? – jan93 Sep 22 '18 at 15:52
  • Depends by two factors: the size of the list and the times you use this list. For a list of 100 elements i think there is no space problem. So the last factor is the times you use this even numbers list, if you use once you can use the generator otherwise i suggest you the list – Mikedev Sep 22 '18 at 15:57
  • I don't know if you know but there are two shorter way to generate the even numbers: `filter(lambda el: el % 2 == 1, range(100))` is a generator or the "generator comprehension" `(el for el in range(100) if el % 2 == 1)` is also a generator – Mikedev Sep 22 '18 at 16:00
  • so if my list was for 1000000000 elements then my generator method would be more efficient? I guess the thing im struggling to understand is that both methods involve appending values to a list, so what makes the generator one more efficient after i've added all the values to a list anyways... – jan93 Sep 22 '18 at 16:00
  • It's not more efficient but in a lot of computer you can't keep 1000000000 elements together in memory so you are forced to use a generator to process this sequence of data – Mikedev Sep 22 '18 at 16:03