2

I've just finished reading a couple of tutorials for Asyncio of RealPython that explain asyncio (and threading) are great for I/O bound processes. The tutorials mainly use asyncio.sleep() and the aiohttps module's asynchronous session.get(url) functions to represent I/O 'work' that we'd like to pass control back to other tasks while they are running.

I saw I/O bound and thought to try writing large-ish files, the idea being to reduce the time taken to write 10 large files by running them asychronously instead of synchronously, since we are left waiting while each file is being written. I thought this would be like getting a response from session.get(url) in the context of the file system. However, I can't seem to get any performance benefits over synchronously writing the files, so I've either implemented the asynchronous file writing (using aiofile) wrongly, or I've misunderstood something file writing cannot be implemented asynchronously - my first guess for this was that there is a maximum file writing speed/capacity and that's already hit while the first task is executing. A third possibility is that it is implemented correctly but happens to take longer than the synchronous implementation in this case, but I find that unlikely.

Below is the code I've written to compare the synchronous and asynchronous implementations of writing a large file. Any help explaining where I've gone wrong in implementation or understanding would be amazing.

Imports

import asyncio
from aiofile import AIOFile
from codetiming import Timer

Synchronous File Writing

def write_file(text):
    with open('somefile.txt', 'w') as file:
        print(f'Writing file {text[:20]}')
        file.write(text)

def main_sync():
    timer = Timer(text=f"Task elapsed time: {{:.1f}}")
    timer.start()
    for text in [
        "http://google.com",
        "http://yahoo.com",
        "http://linkedin.com",
        "http://apple.com",
        "http://microsoft.com",
        "http://facebook.com",
        "http://twitter.com",
    ]:
        print(f"Task writing file: {text[:20]}")
        write_file(text*40000000)
    timer.stop()

Async File Writing

async def aio_file_write(text):
    async with AIOFile("somefile.txt", 'w') as file:
        print(f'Writing file {text[:20]}')
        await file.write(text)
        await file.fsync()

async def task(name, work_queue):
    timer = Timer(text=f"Task {name} elapsed time: {{:.1f}}")
    while not work_queue.empty():
        text = await work_queue.get()
        print(f"Task {name} writing file: {text[:20]}")
        timer.start()
        await aio_file_write(text)
        timer.stop()

async def main():
    work_queue = asyncio.Queue()

    for text in [
        "http://google.com",
        "http://yahoo.com",
        "http://linkedin.com",
        "http://apple.com",
        "http://microsoft.com",
        "http://facebook.com",
        "http://twitter.com",
    ]:
        await work_queue.put(text*40000000)

    with Timer(text="\nTotal elapsed time: {:.1f}"):
        await asyncio.gather(
            asyncio.create_task(task("One", work_queue)),
            asyncio.create_task(task("Two", work_queue)),
        )

Script

if __name__== "__main__":
asyncio.run(main())

else:
main_sync()
EyreCraggs
  • 51
  • 1
  • 5
  • 7
    You say *"the idea being to reduce the time taken to write 10 large files by running them asychronously"* - That won't work. The time needed to write the data will stay the same. You can only prevent that your main thread blocks while this is being done, or do other things while you wait, but it won't speed up the write. – Tomalak May 25 '20 at 16:00
  • 1
    The limiting factor for writing to files is usually the filesystem/hardware. Unless you are writing to multiple filesystems at the same time, async will not provide better maximum performance. – MisterMiyagi May 25 '20 at 16:00
  • Take note that the main benefits of ``async`` are high-concurrency applications with many context switches. That's the domain of some 1k, rather 10k and more tasks. For just 2 tasks, ``async`` does not have any benefit over e.g. threads – which *also* suspend on I/O to let other threads run. – MisterMiyagi May 25 '20 at 16:07
  • 1
    Does this answer your question? [Does asyncio supports asynchronous I/O for file operations?](https://stackoverflow.com/questions/34699948/does-asyncio-supports-asynchronous-i-o-for-file-operations) – ggorlen Apr 11 '21 at 02:40
  • See also [Asynchronous file writing possible in python?](https://stackoverflow.com/questions/319132/asynchronous-file-writing-possible-in-python) – ggorlen Apr 11 '21 at 02:40

0 Answers0