At first sight, yield from
is an algorithmic shortcut for:
def generator1():
for item in generator2():
yield item
# do more things in this generator
Which is then mostly equivalent to just:
def generator1():
yield from generator2()
# more things on this generator
In English: when used inside an iterable, yield from
issues each element in another iterable, as if that item were coming from the first generator, from the point of view of the code calling the first generator.
The main reasoning for its creation is to allow easy refactoring of code relying heavily on iterators - code which use ordinary functions always could, at very little extra cost, have blocks of one function refactored to other functions, which are then called - that divides tasks, simplifies reading and maintaining the code, and allows for more reusability of small code snippets -
So, large functions like this:
def func1():
# some calculation
for i in somesequence:
# complex calculation using i
# ...
# ...
# ...
# some more code to wrap up results
# finalizing
# ...
Can become code like this, without drawbacks:
def func2(i):
# complex calculation using i
# ...
# ...
# ...
return calculated_value
def func1():
# some calculation
for i in somesequence:
func2(i)
# some more code to wrap up results
# finalizing
# ...
When getting to iterators however, the form
def generator1():
for item in generator2():
yield item
# do more things in this generator
for item in generator1():
# do things
requires that for each item consumed from generator2
, the running context be first switched to generator1
, nothing is done in that context, and the cotnext have to be switched to generator2
- and when that one yields a value, there is another intermediate context switch to generator1, before getting the value to the actual code consuming those values.
With yield from these intermediate context switches are avoided, which can save quite some resources if there are a lot of iterators chained: the context switches straight from the context consuming the outermost generator to the innermost generator, skipping the context of the intermediate generators altogether, until the inner ones are exhausted.
Later on, the language took advantage of this "tunelling" through intermediate contexts to use these generators as co-routines: functions that can make asynchronous calls. With the proper framework in place, as descibed in https://www.python.org/dev/peps/pep-3156/ , these co-routines are written in a way that when they will call a function that would take a long time to resolve (due to a network operation, or a CPU intensive operation that can be offloaded to another thread) - that call is made with a yield from
statement - the framework main loop then arranges so that the called expensive function is properly scheduled, and retakes execution (the framework mainloop is always the code calling the co-routines themselves). When the expensive result is ready, the framework makes the called co-routine behave like an exhausted generator, and execution of the first co-routine resumes.
From the programmer's point of view it is as if the code was running straight forward, with no interruptions. From the process point of view, the co-routine was paused at the point of the expensive call, and other (possibly parallel calls to the same co-routine) continued running.
So, one might write as part of a web crawler some code along:
@asyncio.coroutine
def crawler(url):
page_content = yield from async_http_fetch(url)
urls = parse(page_content)
...
Which could fetch tens of html pages concurrently when called from the asyncio loop.
Python 3.4 added the asyncio
module to the stdlib as the default provider for this kind of functionality. It worked so well, that in Python 3.5 several new keywords were added to the language to distinguish co-routines and asynchronous calls from the generator usage, described above. These are described in https://www.python.org/dev/peps/pep-0492/