I was migrating a production system to async when I realized the synchronous version is 20x faster than the async version. I was able to create a very simple example to demonstrate this in a repeatable way;
Asynchronous Version
import asyncio, time
data = {}
async def process_usage(key):
data[key] = key
async def main():
await asyncio.gather(*(process_usage(key) for key in range(0,1000000)))
s = time.perf_counter()
results = asyncio.run(main())
elapsed = time.perf_counter() - s
print(f"Took {elapsed:0.2f} seconds.")
This takes 19 seconds. The code loops through 1M keys and builds a dictionary, data
with the same key and value.
$ python3.7 async_test.py
Took 19.08 seconds.
Synchronous Version
import time
data = {}
def process_usage(key):
data[key] = key
def main():
for key in range(0,1000000):
process_usage(key)
s = time.perf_counter()
results = main()
elapsed = time.perf_counter() - s
print(f"Took {elapsed:0.2f} seconds.")
This takes 0.17 seconds! And does exactly the same thing as above.
$ python3.7 test.py
Took 0.17 seconds.
Asynchronous Version with create_task
import asyncio, time
data = {}
async def process_usage(key):
data[key] = key
async def main():
for key in range(0,1000000):
asyncio.create_task(process_usage(key))
s = time.perf_counter()
results = asyncio.run(main())
elapsed = time.perf_counter() - s
print(f"Took {elapsed:0.2f} seconds.")
This version brings it down to 11 seconds.
$ python3.7 async_test2.py
Took 11.91 seconds.
Why does this happen?
In my production code I will have a blocking call in process_usage
where I save the value of key to a redis database.