0

I was taking a look at the code of a coworker and I felt like this was an unnecessary use of the yield statement. It was something like this:

def standardize_text(text: str):
    pattern = r"ABC" # some regex
    yield re.sub(pattern, "X", text)

def preprocess_docs(docs: List[str]):
    for doc in docs:
        yield standardize_text(doc)

I understand the use of yield in preprocess_docs so that I can return a generator, which would be helpful if docs is a large list. But I don't understand the value of the yield in the standardize_text function. To me, a return statement would do the exact same thing.

Is there a reason why that yield would be useful?

martineau
  • 119,623
  • 25
  • 170
  • 301
dr_otter
  • 67
  • 5
  • 5
    I think you are right - I can't see the benefit of doing a single yield of a string. – Tony Suffolk 66 Jun 26 '21 at 23:21
  • I'm curious why you didn't just ask your coworker... (for the record, I don't see a good reason, but I'm no expert on Python generators) – Robin Zigmond Jun 26 '21 at 23:22
  • 1
    `return` wouldn't do the same thing, it would make `preprocess_docs` a simple generator of values, rather than a generator of (single element) generators. I can't imagine *why* you'd want that, but there is a difference. Changing `preprocess_docs` to use `yield from` instead of `yield` would have the same effect (though it would almost certainly be slower than `return` with plain `yield`). – ShadowRanger Jun 26 '21 at 23:25
  • Yeah, not useful. Also, if you're going to annotate your parameters, you should annotate your return type. If your coworker had type annotated this code (using mypy to make sure the annotation was correct) they would probably have realized on their own that returning a `Generator[Generator[str]]` wasn't actually what they wanted to do. – Samwise Jun 26 '21 at 23:28
  • Have a look at https://stackoverflow.com/questions/231767/what-does-the-yield-keyword-do – cup Jun 26 '21 at 23:36
  • If this is a standalone function, then your coworker is confused. The only rational explanation is that `standardize_text` is getting passed to some pre-existing API that *requires* a generator (and, perhaps, the fact that his particular use case only generates one value is incidental). – Silvio Mayolo Jun 26 '21 at 23:43

1 Answers1

1

To me, a return statement would do the exact same thing.

Using return instead wouldn't be the same as yield, as explained in ShadowRanger's comment.

With yield, calling the function gives you a generator object:

>>> standardize_text("ABCD")
<generator object standardize_text at 0x10561f740>

Generators can produce more than one result (unlike functions that use return). This generator happens to produce exactly one item, which is a string (the result of re.sub). You can collect the generator's results into a list(), for example, or just grab the first result with next():

>>> list(standardize_text("ABCD"))
['XD']

>>> g = standardize_text("ABCD")
>>> next(g)
'XD'
>>> next(g) # raises StopIteration, indicating the generator has finished

If we change the function to use return:

def standardize_text(text: str):
    pattern = r"ABC" # some regex
    return re.sub(pattern, "X", text)

Then calling the function just gives us the single result only — no list() or next() needed.

>>> standardize_text("ABCD")
'XD'

Is there a reason why that yield would be useful?

In the standardize_text function, no, not really. But your preprocess_docs function actually does make use of returning more than one value with yield: it returns a generator with one result for each of the values in docs. Those results are either generators themselves (in your original code with yield) or strings (if we change standardize_text to use return).

def preprocess_docs(docs: List[str]):
    for doc in docs:
        yield standardize_text(doc)

# returns a generator because the implementation uses "yield"
>>> preprocess_docs(["ABCD", "AAABC"])
<generator object preprocess_docs at 0x10561f820>

# with standardize_text using "yield re.sub..."
>>> for x in preprocess_docs(["ABCD", "AAABC"]): print(x)
... 
<generator object standardize_text at 0x1056cce40>
<generator object standardize_text at 0x1056cceb0>


# with standardize_text using "return re.sub..."
>>> for x in preprocess_docs(["ABCD", "AAABC"]): print(x)
... 
XD
AAX

Note: Prior to Python 3's async/await, some concurrency libraries used yield in the same way that await is now used. For example, Twisted's @inlineCallbacks. I don't think this is directly relevant to your question, but I included it for completeness.

jtbandes
  • 115,675
  • 35
  • 233
  • 266