132

Let's say we have a dummy function:

async def foo(arg):
    result = await some_remote_call(arg)
    return result.upper()

What's the difference between:

import asyncio    

coros = []
for i in range(5):
    coros.append(foo(i))

loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(coros))

And:

import asyncio

futures = []
for i in range(5):
    futures.append(asyncio.ensure_future(foo(i)))

loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(futures))

Note: The example returns a result, but this isn't the focus of the question. When return value matters, use gather() instead of wait().

Regardless of return value, I'm looking for clarity on ensure_future(). wait(coros) and wait(futures) both run the coroutines, so when and why should a coroutine be wrapped in ensure_future?

Basically, what's the Right Way (tm) to run a bunch of non-blocking operations using Python 3.5's async?

For extra credit, what if I want to batch the calls? For example, I need to call some_remote_call(...) 1000 times, but I don't want to crush the web server/database/etc with 1000 simultaneous connections. This is doable with a thread or process pool, but is there a way to do this with asyncio?

2020 update (Python 3.7+): Don't use these snippets. Instead use:

import asyncio

async def do_something_async():
    tasks = []
    for i in range(5):
        tasks.append(asyncio.create_task(foo(i)))
    await asyncio.gather(*tasks)

def do_something():
    asyncio.run(do_something_async)

Also consider using Trio, a robust 3rd party alternative to asyncio.

knite
  • 6,033
  • 6
  • 38
  • 54

5 Answers5

119

A coroutine is a generator function that can both yield values and accept values from the outside. The benefit of using a coroutine is that we can pause the execution of a function and resume it later. In case of a network operation, it makes sense to pause the execution of a function while we're waiting for the response. We can use the time to run some other functions.

A future is like the Promise objects from Javascript. It is like a placeholder for a value that will be materialized in the future. In the above-mentioned case, while waiting on network I/O, a function can give us a container, a promise that it will fill the container with the value when the operation completes. We hold on to the future object and when it's fulfilled, we can call a method on it to retrieve the actual result.

Direct Answer: You don't need ensure_future if you don't need the results. They are good if you need the results or retrieve exceptions occurred.

Extra Credits: I would choose run_in_executor and pass an Executor instance to control the number of max workers.

Explanations and Sample codes

In the first example, you are using coroutines. The wait function takes a bunch of coroutines and combines them together. So wait() finishes when all the coroutines are exhausted (completed/finished returning all the values).

loop = get_event_loop() # 
loop.run_until_complete(wait(coros))

The run_until_complete method would make sure that the loop is alive until the execution is finished. Please notice how you are not getting the results of the async execution in this case.

In the second example, you are using the ensure_future function to wrap a coroutine and return a Task object which is a kind of Future. The coroutine is scheduled to be executed in the main event loop when you call ensure_future. The returned future/task object doesn't yet have a value but over time, when the network operations finish, the future object will hold the result of the operation.

from asyncio import ensure_future

futures = []
for i in range(5):
    futures.append(ensure_future(foo(i)))

loop = get_event_loop()
loop.run_until_complete(wait(futures))

So in this example, we're doing the same thing except we're using futures instead of just using coroutines.

Let's look at an example of how to use asyncio/coroutines/futures:

import asyncio


async def slow_operation():
    await asyncio.sleep(1)
    return 'Future is done!'


def got_result(future):
    print(future.result())

    # We have result, so let's stop
    loop.stop()


loop = asyncio.get_event_loop()
task = loop.create_task(slow_operation())
task.add_done_callback(got_result)

# We run forever
loop.run_forever()

Here, we have used the create_task method on the loop object. ensure_future would schedule the task in the main event loop. This method enables us to schedule a coroutine on a loop we choose.

We also see the concept of adding a callback using the add_done_callback method on the task object.

A Task is done when the coroutine returns a value, raises an exception or gets canceled. There are methods to check these incidents.

I have written some blog posts on these topics which might help:

Of course, you can find more details on the official manual: https://docs.python.org/3/library/asyncio.html

Innat
  • 16,113
  • 6
  • 53
  • 101
masnun
  • 11,635
  • 4
  • 39
  • 50
  • 3
    I've updated my question to be a bit more clear - if I don't need the result from the coroutine, do I still need to use `ensure_future()`? And if I do need the result, can't I just use `run_until_complete(gather(coros))`? – knite Jan 12 '16 at 21:39
  • 1
    `ensure_future` schedules the coroutine to be executed in the event loop. So I would say yes, it's required. But of course you can schedule the coroutines using other functions/methods too. Yes, you can use `gather()` - but gather will wait until all the responses are collected. – masnun Jan 12 '16 at 21:42
  • Also the coroutines would need to be on the same event loop. You can read more details here: https://docs.python.org/3/library/asyncio-task.html#asyncio.gather - I guess `gather` would work fine for your case. – masnun Jan 12 '16 at 21:44
  • If I `wait()` on coroutines directly instead of wrapping them in `ensure_future`, they still execute. So, in the case where I don't need the return values, it's still not clear to me when or why to use `ensure_future`? – knite Jan 12 '16 at 21:49
  • If you don't need the results then `wait` would work just fine. You don't need `ensure_future`. I am sorry I misunderstood your first comment. – masnun Jan 12 '16 at 21:54
  • However, notice that `ensure_future` immediately schedules the coroutine for execution. So the scheduling takes place as you iterate over each item. With `wait` you schedule them after you have collected the coroutines. But this might be a minor details in this case. – masnun Jan 12 '16 at 21:55
  • Thanks for the clarification. Perhaps you could tighten up your answer? Also, any thoughts on the "extra credit" about batching coroutines? – knite Jan 12 '16 at 21:58
  • I have updated the answer. Also answered the extra credit section in there. – masnun Jan 12 '16 at 22:08
  • 6
    @AbuAshrafMasnun @knite `gather` and `wait` actually wrap the given coroutines as tasks using `ensure_future` (see the sources [here](https://github.com/python/asyncio/blob/master/asyncio/tasks.py#L614) and [here](https://github.com/python/asyncio/blob/master/asyncio/tasks.py#L346)). So there is no point in using `ensure_future` beforehand, and it has nothing to do with getting the results or not. – Vincent Jan 14 '16 at 09:57
  • @Vincent You are right. Thanks for pointing out. I was unable to find the docs for `wait` and responded from memory. `ensure_future` allows to create a single task at a time and then we can add a callback to it to retrieve the result as it's available. `wait` does that on a bunch of `coroutines` so results are retrieved together. That was my point. – masnun Jan 14 '16 at 12:26
  • 9
    @AbuAshrafMasnun @knite Also, `ensure_future` has a `loop` argument, so there is no reason to use `loop.create_task` over `ensure_future`. And `run_in_executor` won't work with coroutines, a [semaphore](https://docs.python.org/3.4/library/asyncio-sync.html#semaphores) should be used instead. – Vincent Jan 14 '16 at 12:51
  • @Vincent can you show some example how to use semaphore instead of run_in_executor, because after reading the semaphore doc I'm absolutely confused – comalex3 Jul 01 '17 at 08:52
  • 3
    @vincent there is a reason to use `create_task` over `ensure_future`, see [docs](https://docs.python.org/3/library/asyncio-task.html#asyncio.ensure_future). Quote `create_task() (added in Python 3.7) is the preferable way for spawning new tasks. ` – omni Jul 12 '18 at 10:39
  • This should be marked as the correct answer! Well done @masnun – Daniel van Flymen Jul 30 '18 at 09:26
  • The 1st paragraph of this answer is by far the best use case/explanation I've heard for coroutines! – MikeyE Mar 15 '20 at 22:57
57

TL;DR

  • Invoking a coroutine function(async def) will NOT run it. It returns a coroutine object, like generator functions return generator objects.
  • await retrieves values from coroutines, i.e. "calls" the coroutine.
  • eusure_future/create_task wrap a coroutine and schedule it to run on the event loop on next iteration, but will not wait for it to finish, it's like a daemon thread.
  • By awaiting a coroutine or a task wrapping a coroutine, you can always retrieve the result returned by the coroutine, the difference is their execution order.

Some code examples

Let's first clear some terms:

  • coroutine function, the one you async defs;
  • coroutine object, what you got when you "call" a coroutine function;
  • task, a object wrapped around a coroutine object to run on the event loop.
  • awaitable, something that you can await, like task, future or plain coroutine object.

The term coroutine can be both coroutine function and coroutine object depending on the context, but it should be easy enough for you to tell the differences.

Case 1, await on a coroutine

We create two coroutines, await one, and use create_task to run the other one.

import asyncio
import time

# coroutine function
async def log_time(word):
    print(f'{time.time()} - {word}')

async def main():
    coro = log_time('plain await')
    task = asyncio.create_task(log_time('create_task'))  # <- runs in next iteration
    await coro  # <-- run directly
    await task

if __name__ == "__main__":
    asyncio.run(main())

You will get results like this, plain coroutine was executed first as expected:

1539486251.7055213 - plain await
1539486251.7055705 - create_task

Because coro was executed directly, and task was executed in the next iteration.

Case 2, yielding control to event loop

By calling asyncio.sleep(1), the control is yielded back to the loop, we should see a different result:

async def main():
    coro = log_time('plain await')
    task = asyncio.create_task(log_time('create_task'))  # <- runs in next iteration
    await asyncio.sleep(1)  # <- loop got control, and runs task
    await coro  # <-- run directly
    await task

You will get results like this, the execution order is reversed:

1539486378.5244057 - create_task
1539486379.5252144 - plain await

When calling asyncio.sleep(1), the control was yielded back to the event loop, and the loop checks for tasks to run, then it runs the task created by create_task first.

Although we invoked the coroutine function first, without awaiting it, we just created a coroutine, it does NOT start automatically. Then, we create a new coroutine and wrap it by a create_task call, creat_task not only wraps the coroutine, but also schedules the task to run on next iteration. In the result, create_task is executed before plain await.

The magic here is to yield control back to the loop, you can use asyncio.sleep(0) to achieve the same result.

After all the differences, the same thing is: if you await on a coroutine or a task wrapping a coroutine, i.e. an awaitable, you can always retrieve the result they return.

Under the hood

asyncio.create_task calls asyncio.tasks.Task(), which will call loop.call_soon. And loop.call_soon will put the task in loop._ready. During each iteration of the loop, it checks for every callbacks in loop._ready and runs it.

asyncio.wait, asyncio.ensure_future and asyncio.gather actually call loop.create_task directly or indirectly.

Also note in the docs:

Callbacks are called in the order in which they are registered. Each callback will be called exactly once.

ospider
  • 9,334
  • 3
  • 46
  • 46
  • 4
    Thanks for a clean explanation! Have to say, it's a pretty terrible design. High-level API is leaking low-level abstraction, which overcomplicate the API. – Boris Burkov Feb 20 '19 at 11:49
  • 1
    check out the curio project, which is well-designed – ospider Feb 20 '19 at 13:02
  • 2
    Nice explanation! I think the effect of the `await task2` call could be clarified. In both examples, the loop.create_task() call is what schedules task2 on the event loop. So in both exs you can delete the `await task2` and still task2 will eventually run. In ex2 the behaviour will be identical, as the `await task2` I believe is just scheduling the already completed task (which wont run a second time), whereas in ex1 the behaviour will be slightly different since task2 wont be executed until main is complete. To see the difference, add `print("end of main")` at the end of ex1's main – Andrew Jan 23 '20 at 11:44
16

A comment by Vincent linked to https://github.com/python/asyncio/blob/master/asyncio/tasks.py#L346, which shows that wait() wraps the coroutines in ensure_future() for you!

In other words, we do need a future, and coroutines will be silently transformed into them.

I'll update this answer when I find a definitive explanation of how to batch coroutines/futures.

Undo
  • 25,519
  • 37
  • 106
  • 129
knite
  • 6,033
  • 6
  • 38
  • 54
  • Does it mean that for a coroutine object `c`, `await c` is equivalent to `await create_task(c)`? – Alexey May 06 '20 at 16:21
5

From the BDFL [2013]

Tasks

  • It's a coroutine wrapped in a Future
  • class Task is a subclass of class Future
  • So it works with await too!

  • How does it differ from a bare coroutine?
  • It can make progress without waiting for it
    • As long as you wait for something else, i.e.
      • await [something_else]

With this in mind, ensure_future makes sense as a name for creating a Task since the Future's result will be computed whether or not you await it (as long as you await something). This allows the event loop to complete your Task while you're waiting on other things. Note that in Python 3.7 create_task is the preferred way ensure a future.

Note: I changed "yield from" in Guido's slides to "await" here for modernity.

crizCraig
  • 8,487
  • 6
  • 54
  • 53
1

Though there already are a few very useful answers they don't cover all the nuances. In particular, the accepted answer is no longer correct.

You should not use wait with coroutines - for compatibility with new versions of library.

Documentation:

Deprecated since version 3.8, will be removed in version 3.11: Passing coroutine objects to wait() directly is deprecated.

And another statement from documentation that may be useful for the deep understanding. Result of wait is futures. If you want to check that your coroutine is in result you should wrap it into future first - with create_task (since it is preferred way to create task than ensure_future).

wait() schedules coroutines as Tasks automatically and later returns those implicitly created Task objects in (done, pending) sets. Therefore the following code won’t work as expected:

async def foo():
    return 42

coro = foo() 
done, pending = await asyncio.wait({coro})

if coro in done:
    # This branch will never be run! 

Here is how the above snippet can be fixed:

    return 42

task = asyncio.create_task(foo()) 
done, pending = await asyncio.wait({task})

if task in done:
    # Everything will work as expected now. 
egvo
  • 1,493
  • 18
  • 26