0

I'm experimenting mit named pipes and async approaches and was a bit surprised, how slow reading the file I've created seems to be.

And as this question suggests, this effect is not limited to named pipes as in the example below but applies to 'normal' files as well. Since my final goal is reading those named pipes I prefer to keep the examples below.

So here is what I initially came up with:

import sys, os
from asyncio import create_subprocess_exec, gather, run
from asyncio.subprocess import DEVNULL
from aiofile import async_open

async def read_strace(namedpipe):
    with open("async.log", "w") as outfp:
        async with async_open(namedpipe, "r") as npfp:
            async for line in npfp:
                outfp.write(line)

async def main(cmd):
    try:
        myfifo = os.mkfifo('myfifo', 0o600)
        process = await create_subprocess_exec(
            "strace", "-o", "myfifo", *cmd, 
            stdout=DEVNULL, stderr=DEVNULL)
        await gather(read_strace("myfifo"), process.wait())
    finally:
        os.unlink("myfifo")

run(main(sys.argv[1:]))

You can run it like ./sync_program.py <CMD> e.g. ./sync_program.py find .

This one uses default Popen and reads what strace writes to myfifo:

from subprocess import Popen, DEVNULL
import sys, os

def read_strace(namedpipe):
    with open("sync.log", "w") as outfp:
        with open(namedpipe, "r") as npfp:
            for line in npfp:
                outfp.write(line)
   
def main(cmd):
    try:
        myfifo = os.mkfifo('myfifo', 0o600)
        process = Popen(
            ["strace", "-o", "myfifo", *cmd],
            stdout=DEVNULL, stderr=DEVNULL)
        read_strace("myfifo"),
    finally:
        os.unlink("myfifo")

main(sys.argv[1:])

Running both programs with time reveals that the async program is about 15x slower:

$ time ./async_program.py  find .   
poetry run ./async_program.py find .  4.06s user 4.75s system 100% cpu 8.727 total
$ time ./sync_program.py find .
poetry run ./sync_program.py find .  0.27s user 0.07s system 76% cpu 0.438 total

The linked question suggests that aiofile is known to be somehow slow, but 15x? I'm pretty sure that I still come close to the synchronous approach by using an extra thread and writing to a queue, but admittedly I didn't try it yet.

Is there a recommended way to read a file asynchronously - maybe even an approach more dedicated to named pipes as I use them in the given example?

frans
  • 8,868
  • 11
  • 58
  • 132
  • I get `OSError: [Errno 29] Illegal seek (... from ...caio/python_aio.py")` when running your async version – RomanPerekhrest Feb 18 '23 at 15:41
  • 1
    [aiofile](https://github.com/mosquito/aiofile) and [aiofiles](https://github.com/Tinche/aiofiles/) are different, btw. The latter being more feature-rich & better maintained. – felipe Feb 22 '23 at 08:53

1 Answers1

1

So async isn't magic. What async is good at is when you are calling something or something is calling you, usually remotely, and there is I/O overhead and delays because of the network, file I/O, etc.

In your case, there won't be any I/O wait having a single process reading a single file (named pipe or not).

So ALL your async is doing here is ADDING overhead to the process to put it into an event loop and release back to the loop repeatedly.

Frank Wiles
  • 1,589
  • 11
  • 13
  • That's true, but doesn't explain why it's 15x as time consuming - I wrote other programs which read streams asynchronously with close to no overhead. Aside from that, those are stripped down examples - the original program does other things, too, e.g. reading stdin and stdout of the process I've started. After all I'm not trying to speed things up, but to remove complexity and potential deadlocks from an otherwise thread based approach. – frans Feb 18 '23 at 16:24