0

I have an IO operation (POST request for each Pandas row) which I'm trying to speed up using Async IO. An alternative minimal example can be found below. I want to understand why the 1st sample doesn't run in parallel and 2nd sample is faster.

1st Sample:

from time import sleep
import asyncio
import nest_asyncio
nest_asyncio.apply()


async def add(x: int, y: int, delay: int):
  sleep(delay) #await asyncio.sleep(delay)
  return x+y
  

async def get_results():
  inputs = [(2,3,9), (4,5,7), (6,7,5), (8,9,3)]
  
  cors = [add(x,y,z) for x,y,z in inputs]
  results = asyncio.gather(*cors)
    
  print(await results)

asyncio.run(get_results())

# This takes ~24s

2nd Sample:

from time import sleep
import asyncio
import nest_asyncio
nest_asyncio.apply()


async def add(x: int, y: int, delay: int):
  await asyncio.sleep(delay)
  return x+y
  

async def get_results():
  inputs = [(2,3,9), (4,5,7), (6,7,5), (8,9,3)]
  
  cors = [add(x,y,z) for x,y,z in inputs]
  results = asyncio.gather(*cors)
    
  print(await results)

asyncio.run(get_results())

# This takes ~9s as expected

In my case the sleep can be replace with requests.post()

Nishanth
  • 6,932
  • 5
  • 26
  • 38
  • Not that just like time.sleep is not async, neither is requests. If you want to use asyncio, you’ll need to use a library like aiohttp instead. – dirn Oct 18 '22 at 13:41

3 Answers3

1

requests isn't async aware, so trying to use it in an async function doesn't make anything faster.

You will need an async aware HTTP library such as httpx or aiohttp to make HTTP requests that don't block async functions.

Similarly, in the first example, you're using a non-async-aware function, time.sleep, which blocks the async loop.

Also, a non-IO, Python-native-code operation (an addition) will not be sped up by asyncio.

(Recall that async doesn't mean that the functions would be run in parallel, quite the contrary.)

AKX
  • 152,115
  • 15
  • 115
  • 172
  • Makes sense. I tried multi-process and it does speedup requests. – Nishanth Oct 18 '22 at 10:01
  • It's also an entirely different thing with an entirely different set of tradeoffs. You will have a bad time performance-wise if you try to move a lot of data in and out of multiprocessing children. – AKX Oct 18 '22 at 10:26
  • What is recommended for sending payload row-by-row from a pandas dataframe? – Nishanth Oct 18 '22 at 10:29
  • There's no single recommended way. – AKX Oct 18 '22 at 10:56
0

Because the time.sleep is not async. Thread will be blocking and waiting when running to the code time.sleep.

However await asyncio.sleep is async. Thread will be free to execute other code when running to the code await asyncio.sleep.

sync just like you must order a food and waiting it completion. Then you can oder next food.

async means you can order all food in a time and waiting them completion.

J.C
  • 233
  • 3
  • 7
0

As soon as your code hits sleep.delay it will block all other co-routines from executing in the event loop. All function have to be async compatible

Foton
  • 97
  • 1
  • 9