async version runs slower than the non async version

Question

My program does the following:

Takes folder of .txt files
For each file:

2.1. read the file

2.2 sort the contents as a list and pushes the list to a master list

I did this without any async/await and these are the time statistics

real    0m0.036s

user    0m0.018s

sys     0m0.009s

With the below async/await code I get

real    0m0.144s

user    0m0.116s

sys     0m0.029s

which given the use case suggests that I am using aysncio incorrectly.

Anybody have an idea what I am doing wrong?

import asyncio
import aiofiles
import os

directory = "/tmp"
listOfLists = list()

async def sortingFiles(numbersInList):
    numbersInList.sort()

async def awaitProcessFiles(filename,numbersInList):
    await readFromFile(filename,numbersInList)
    await sortingFiles(numbersInList)
    await appendToList(numbersInList)


async def readFromFile(filename,numbersInList):
    async with aiofiles.open(directory+"/"+filename, 'r') as fin:
        async for line in fin:
            return numbersInList.append(int(line.strip("\n"),10))            
    fin.close()    

async def appendToList(numbersInList):
    listOfLists.append(numbersInList)

async def main():
    tasks=[]
    for filename in os.listdir(directory):
        if filename.endswith(".txt"):  
            numbersInList =list()
            task=asyncio.ensure_future(awaitProcessFiles(filename,numbersInList))
            tasks.append(task)
    await asyncio.gather(*tasks)   

if __name__== "__main__":
    asyncio.run(main())

Profiling info:

        151822 function calls (151048 primitive calls) in 0.239 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       11    0.050    0.005    0.050    0.005 {built-in method _imp.create_dynamic}
       57    0.022    0.000    0.022    0.000 {method 'read' of '_io.BufferedReader' objects}
       57    0.018    0.000    0.018    0.000 {built-in method io.open_code}
      267    0.012    0.000    0.012    0.000 {method 'control' of 'select.kqueue' objects}
       57    0.009    0.000    0.009    0.000 {built-in method marshal.loads}
      273    0.009    0.000    0.009    0.000 {method 'recv' of '_socket.socket' objects}
      265    0.005    0.000    0.098    0.000 base_events.py:1780(_run_once)
      313    0.004    0.000    0.004    0.000 {built-in method posix.stat}
      122    0.004    0.000    0.004    0.000 {method 'acquire' of '_thread.lock' objects}
  203/202    0.003    0.000    0.011    0.000 {built-in method builtins.__build_class__}
     1030    0.003    0.000    0.015    0.000 thread.py:158(submit)
     1030    0.003    0.000    0.009    0.000 futures.py:338(_chain_future)
     7473    0.003    0.000    0.003    0.000 {built-in method builtins.hasattr}
     1030    0.002    0.000    0.017    0.000 futures.py:318(_copy_future_state)
       36    0.002    0.000    0.002    0.000 {built-in method posix.getcwd}
     3218    0.002    0.000    0.077    0.000 {method 'run' of 'Context' objects}
     6196    0.002    0.000    0.003    0.000 threading.py:246(__enter__)
     3218    0.002    0.000    0.078    0.000 events.py:79(_run)
     6192    0.002    0.000    0.004    0.000 base_futures.py:13(isfuture)
     1047    0.002    0.000    0.002    0.000 threading.py:222(__init__)

Make some test files...

import random, os
path = <directory name here>
nlines = range(1000)
nfiles = range(1,101)
for n in nfiles:
    fname = f'{n}.txt'
    with open(os.path.join(path,fname),'w') as f:
        for _ in nlines:
            q = f.write(f'{random.randrange(1,10000)}\n')

If your code is cpu-bound(aka the majority of the time is spent sorting), this result is reasonable since async will add overhead. Have you profiled to see exactly what is taking time? — PiRocks, Feb 02 '20 at 16:49
How many files? How many lines in the files (you are sorting lines?)? Seems like you should only need async for opening and reading the files. The sorting and accumulating can be done in the same coroutine as the open and read. I would probably done this with `concurrent.futures`. — wwii, Feb 02 '20 at 16:52
Currently tested with 10 files with approximately 1000 lines each. — rrgirish, Feb 02 '20 at 16:57
@PiRocks added profiling info to the description. I thought the whole point of doing the async was to do non blocking I/O. I also removed the async from sorting and adding to the list. Still the performance is not as good as the non-async one. — rrgirish, Feb 02 '20 at 17:16
There are some errors in your code - please provide a [mcve]. — wwii, Feb 02 '20 at 17:16
@wwii updated the code. Not sure how you can reproduce completely without the input .txt files. — rrgirish, Feb 02 '20 at 17:22
1000-lines files aren't very big; it's possible the overhead in managing asynchronous code is greater than the savings you get by interleaving processing and IO. — chepner, Feb 02 '20 at 17:41
With 100, 100k line, files a `concurrent.futures` or `asyncio` is still slower than just iterating over the files and processing them. — wwii, Feb 02 '20 at 18:27

score 12 · Accepted Answer · answered Feb 02 '20 at 21:28

asyncio makes little sense for local files. That is the reason, even python standard library does not have them.

async for line in fin:

Consider the above line. The event loop pauses the co-routine for every line read and executes some other co-routine. Which means the following lines of the file in the cpu cache are just thrown away to make space for the next co-routine. (They will still be in RAM though).

When should aiofiles be used?

Consider you already use async code in your program and occasionally you have to do some file processing. If file processing was done in the same event loop, all the other co-routines are going to be blocked. In that case you can either use aiofiles or do the processing in a different executor.

If all the program is doing is just reading from files. It will be faster to do them sequentially so that it makes good use of cache. Jumping from one file to another is like an thread context switch and should make it slower.

Just to add to your answer, I think the use of asyncio here is inappropriate. Asyncio is for IO bound process. Here we're doing a number of list operations, like append and sort. Unless I am mistaken, these are not IO-bound processes. Using a coroutine for these operations doesn't really make sense with that in mind. — PirateNinjas, Feb 03 '20 at 09:56
For really large files, reading them can become the bottleneck if you don't do much processing with the read data, but I agree that this might be difficult to achieve given Python's performance. — Tom Pohl, Jun 22 '21 at 18:10

async version runs slower than the non async version

1 Answers1