-2

I am really confused as what does keyword "yield" return in generator? what are the real use case of this, when should i use it.

how is it different from "return" keyword?

what i have learnt is generator is better in term of performance but i cannot think of any real use case, if asked in interviews !

Thanks in advance!

young_minds1
  • 1,181
  • 3
  • 10
  • 25

3 Answers3

0

Return sends a specified value back to its caller whereas Yield can produce a sequence of values. We should use yield when we want to iterate over a sequence, but don’t want to store the entire sequence in memory.

You can read more about the differences here

NiKS
  • 377
  • 3
  • 15
  • `Return is used to return only one value` Technically inaccurate. – Error - Syntactical Remorse Nov 15 '19 at 15:42
  • Yeah I have corrected it. – NiKS Nov 15 '19 at 15:43
  • Also inaccurate "We should use yield when we want to iterate over a sequence, but don’t want to store the entire sequence in memory.". Yield is used to create generators. Generators can be used to create iterators. Yield remembers state, so it can be used to create an iterator over an iterable object/sequence; however it does not save memory when being used to iterate over a sequence. – Error - Syntactical Remorse Nov 15 '19 at 15:46
0

The difference between yielding a single value and returning a single value is that yield wraps the value in an iterator, which is also called a stream or enumerator in other languages. A list is one example of an enumerator, and to simplify this answer, you can pretend that all iterators are just lists.

The difference between yielding many values (say, inside a for loop and returning an iterator (or list), is when the values are calculated. With yield, one value is calculated, and returned to the caller. If the caller doesn't need the whole list of values, the rest of the list is not even calculated.

However, when returning a list, the entire list must be calculated beforehand. Say you have this function:

def findIndex(enumerator, item):
    idx = 0
    for value in enumerator:
        if (value == item):
            return idx
        idx = idx + 1

It takes an iterator, and searches for an item, returning the index of that item.

Now, here's where iterators make a difference. Imagine that you are going to call findIndex like this:

findIndex(gimme_the_values(), 3);

Say that gimme_the_values is some function which calculates a list of integers; however, let's also say that, the process of calculating those integers takes a long time, for some reason. Maybe, you're scanning through a 1500 page document, looking for every number that occurs in it, and that's the list of values that you're returning.

Now, let's say that the first several numbers to occur in this document are the numbers 7, 1998, 3, and 18; and let's say that the three occurs on the 40th page. If you define gimme_the_values to use yield, you can stop generating that "list" at page 40 — you'll never even scan for and return the the 18. However, if gimme_the_values returns a list instead of yielding, you have to scan every page, and generate the whole list, even though you really only need the first 3 in this case.

jpaugh
  • 6,634
  • 4
  • 38
  • 90
0

This may be useful for text processing. If you have a larg corpus and you want to normalize the characters in the text, you apply a normalize function for every text for example.

You would like a function that loads a text just when you are going to use it and not the complete corpus because it may be too large for your computer.

Example:

from lxml import etree

def get_data(data_directory, parser):
    for filename in os.listdir(data_directory):
        if filename.endswith("xml"):
            tree = etree.parse(os.path.join(data_directory, filename), parser=parser)
            yield tree.getroot()
        else:
            print("None")
    return None

You have a directory where all your files are. You want to parse only the XML files.

You can do such processing with a yield statement as if you loaded all your data:

for root in get_data(DATA_DIRECTORY, parser):
    result = process(root)
    save_result(result)
clemsciences
  • 121
  • 9