I'm experimenting mit named pipes and async
approaches and was a bit surprised, how slow reading the file I've created seems to be.
And as this question suggests, this effect is not limited to named pipes as in the example below but applies to 'normal' files as well. Since my final goal is reading those named pipes I prefer to keep the examples below.
So here is what I initially came up with:
import sys, os
from asyncio import create_subprocess_exec, gather, run
from asyncio.subprocess import DEVNULL
from aiofile import async_open
async def read_strace(namedpipe):
with open("async.log", "w") as outfp:
async with async_open(namedpipe, "r") as npfp:
async for line in npfp:
outfp.write(line)
async def main(cmd):
try:
myfifo = os.mkfifo('myfifo', 0o600)
process = await create_subprocess_exec(
"strace", "-o", "myfifo", *cmd,
stdout=DEVNULL, stderr=DEVNULL)
await gather(read_strace("myfifo"), process.wait())
finally:
os.unlink("myfifo")
run(main(sys.argv[1:]))
You can run it like ./sync_program.py <CMD>
e.g. ./sync_program.py find .
This one uses default Popen
and reads what strace
writes to myfifo
:
from subprocess import Popen, DEVNULL
import sys, os
def read_strace(namedpipe):
with open("sync.log", "w") as outfp:
with open(namedpipe, "r") as npfp:
for line in npfp:
outfp.write(line)
def main(cmd):
try:
myfifo = os.mkfifo('myfifo', 0o600)
process = Popen(
["strace", "-o", "myfifo", *cmd],
stdout=DEVNULL, stderr=DEVNULL)
read_strace("myfifo"),
finally:
os.unlink("myfifo")
main(sys.argv[1:])
Running both programs with time
reveals that the async program is about 15x slower:
$ time ./async_program.py find .
poetry run ./async_program.py find . 4.06s user 4.75s system 100% cpu 8.727 total
$ time ./sync_program.py find .
poetry run ./sync_program.py find . 0.27s user 0.07s system 76% cpu 0.438 total
The linked question suggests that aiofile
is known to be somehow slow, but 15x? I'm pretty sure that I still come close to the synchronous approach by using an extra thread and writing to a queue, but admittedly I didn't try it yet.
Is there a recommended way to read a file asynchronously - maybe even an approach more dedicated to named pipes as I use them in the given example?