0

I am writing a data analysis pipeline. I have the following directory structure, where each folder (image1, image2, ... image100) within "data" contains information of a 3D image:

  • data_pipeline.py
  • data
    • image1

      • raw_data
        • slice1
        • slice2
        • ...
      • processed_data
      • image_specific_codes
        • init.py
        • open_slices.py (containing unpack_slices() function for image1)
    • image2

      • raw_data
        • slice1_to_5
        • slice6_to_10
        • ...
      • processed_data
      • image_specific_codes
        • init.py
        • open_slices.py (containing unpack_slices() function for image2)

The analysis pipeline is the same for each 3D image (image1, image2) for the most part, other than unpacking the raw data. I am looking for a way to create a general pipeline, data_pipeline.py, that loops through all the folders in "data" (image1, image2, etc.), unpacks them according to open_slices.py, and runs through the rest of the pipeline.

I have tried using importlib to dynamically import specific functions, for example:

import os
import importlib

for i_image in os.listdir():
    os.chdir(i_image)
    module = importlib.import_module('.open_slices', package='image_specific_codes')
    unpack_fn = getattr(module, 'unpack_slices')
    unpacked_image = unpack_fn()

    # At this point, unpacked_image is in a consistent format for the analysis pipeline
    # ...

but I am never able to import the modules successfully, getting errors such as the one below:

Traceback (most recent call last):
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 972, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 984, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'image_specific_codes'
  • "I have tried using importlib to dynamically import specific functions" ok, and then what happened? – juanpa.arrivillaga Jun 02 '23 at 21:46
  • @juanpa.arrivillaga importlib does not identify the 'image_specific_codes' directory as a module (I've added it to the question). I didn't add a specific error message since I tried every way I could conceive of using importlib.import_module with different error messages each time. I was hoping the question would be more of a general solution to the issue than of this specific error. – Miguel Romanello Jun 02 '23 at 22:03
  • Please, in general, **always** be as specific as possible. Strive to provide a [mcve]. And since you are relying on an argumentless `os.listdir` you would need to at least tell us what your working directory is. And you give us no debugging information. Anyway, have you tried the following approach: https://stackoverflow.com/questions/67631/how-can-i-import-a-module-dynamically-given-the-full-path – juanpa.arrivillaga Jun 02 '23 at 22:54
  • Anyway, this approach which relies on changing the working directory should work, it's hard to say why it *isn't* working in your case (again, you've provided no debugging information, and not adequate infomration about how your are actually executing this), although, you would also probably have to invalidate the import caches if you actually got passed that initial hurdle. So all that being said, I would go with the approach in the above case, which imports a module given its path, which will work assuming those modules are self contained and would be less brittle – juanpa.arrivillaga Jun 02 '23 at 23:01
  • Also, I just realized, `os.chdir(i_image)` will fail once you've changed the directory already once. Again, the sort of problem that arises when doing this this way (you could write a context manager to get you back to your original directory, but again, maybe just use the import by path approach instead) – juanpa.arrivillaga Jun 02 '23 at 23:08

1 Answers1

0

The way you are proposing seems a bit overly complicated. Having an individual script for each image seems unnecessary and even though you did not specify what is in the open_slices.py scripts, I am pretty sure they have a lot in common and some might even be identical across images.

My suggestion would be the following:

  • define a unique function (or an object) outside of the data folder to do the unpacking
  • replace the open_slices.py files with txt files (json format is probably your best bet) containing the parameters to pass to the unpacking function when processing the image in the folder
  • simply import the unpacking function into the main script and run it.

This way it should become more organized and flexible to new changes. For more in depth help please share what the so called unpacking functions are doing.

  • Unpacking the data using parameters from a .json/.pkl/.txt file is unfeasible. Some files include more than one slice, some don't. For one image, for example, there is a file that includes slices [0, 2, 4, 6], and another includes [0, 1, 1, 3, 5] (as in, slice 0 and slice 1 were acquired twice, sometimes on the same file, sometimes not. They have to be either deleted or averaged before joining to a single 3D image, depending on how much they match). Pretty much, every image is different. I've done lots of batch processing from parameter files, but this time it is not a reasonable solution. – Miguel Romanello Jun 02 '23 at 21:54
  • Especially given there is still data being acquired and downloaded; I don't even know which other ways I will have to augment the data during the unpacking process. – Miguel Romanello Jun 02 '23 at 21:58