3

My code is:

import os
import h5py
import asyncio
import numpy as np
import matplotlib.pyplot as plt
from dotenv import load_dotenv
load_dotenv()

data_path = os.environ['DATA_PATH']
output_data_path = os.environ['OUTPUT_DATA_PATH']
patient_files = os.listdir(data_path)


async def save_file(filename, image_data):
    plt.imshow(image_data, cmap='gray')
    await plt.savefig(filename, pad_inches=0, bbox_inches='tight')

for patient_file in patient_files:
    patient_parts = patient_file.split('.')
    patient_parts = patient_parts[0].split('_')
    patient_id = patient_parts[1]
    if int(patient_id) < 21:
        continue
    print('Doing patient', patient_id)

    patient_data = h5py.File(os.path.join(data_path, patient_file))
    variables = patient_data.items()

    for var in variables:
        name = var[0]
        data = var[1]

        if type(data) is h5py.Dataset:
            value = data.value
            plt.axis('off')
            if name == 'Svar25':
                for layer in range(value.shape[0]):
                    output_file = os.path.join(
                        output_data_path, patient_id + '_FLAIR_debone_' + str(layer) + '.png')
                    save_file(filename=output_file, image_data=value[layer])

            if name == 'Svar24':
                for layer in range(value.shape[0]):
                    output_file = os.path.join(
                        output_data_path, patient_id + '_FLAIR_bone_' + str(layer) + '.png')
                    save_file(filename=output_file, image_data=value[layer])

I'm trying to write my files asynchronously, but it doesn't actually write.

What am I doing wrong?

Shamoon
  • 41,293
  • 91
  • 306
  • 570

2 Answers2

3

You forgot to add await ahead of save_file(...). That means the asyncio loop will never schedule the file writing function.

Very common mistake in my experience...

Update: From your comment, it appears you never kick off the asyncio loop to process functions. You'll need to start the top level function, as per docs. Python 3.7 has a nice new simple syntax, the principle is the same in 3.5 and 3.6 but a bit more long-winded.

>>> import asyncio

>>> async def main():
...     print('hello')
...     await asyncio.sleep(1)
...     print('world')

>>> asyncio.run(main())
hello
world
Paul Annetts
  • 9,554
  • 1
  • 27
  • 43
  • Do I have to make my entire code somehow `async` then? – Shamoon Feb 15 '19 at 17:40
  • 1
    Yeah, pretty much, async does tend to spread like a virus through your code. You need to kick off the asyncio loop. Then you can do cool stuff like running the saves in parallel, but using a single thread. – Paul Annetts Feb 15 '19 at 17:50
  • My main code isn't in a function, so how can I kick off the `async` portion? – Shamoon Feb 15 '19 at 17:52
  • You'll need to put it in a function. Global variables and async is kind of asking for trouble anyhow ;) – Paul Annetts Feb 15 '19 at 17:55
3

What am I doing wrong?

await plt.savefig(filename, pad_inches=0, bbox_inches='tight')

Function doesn't become async just because you use await before it. You have to declare function as async def to make it async in the first place.

Any function that uses await should be defined as async def up to entry point of your program (like async def main()). You should run entry point in event loop to make async code work.

Both functions plt.imshow and plt.savefig are sync by their nature. You can try to cast second one to async using run_in_executor and awaiting for result, but I'm not sure if it'll work. I'm also not sure if you need it: you probably won't achieve any speedup paralleling these disk I/O operations.


Long story short, please read this answer to better understand why people use asyncio in the first place. I also advice you to read this part of documentation and starting with more trivial async task.

Mikhail Gerasimov
  • 36,989
  • 16
  • 116
  • 159