0

I am brand new to Python and am playing around with in the context of image analysis. Here I am using python to run ilastik (https://www.ilastik.org/documentation/basics/headless.html) in headless mode. This is achieved using subprocess.run().

As part of this i need to pass various arguments that include the file location, the location of an associated file and path for any output files. I am able to define what these are but I am wondering how to keep the information linked such that the correct source directory, target directory, source file name and associated file name are linked.

I am wondering if a dataclass would be appropriate here. This is a simple version that works

import subprocess
import glob
from pathlib import Path
from pathlib import PurePath

ILASTIK_EXECUTABLE = (Path("E:/Program Files/ilastik-1.4.0b15/ilastik.exe"))
PROJECT_FILE = Path("D:/Burch/DNDF/RimSeg.ilp")
SOURCE_DATA = Path("D:/Burch/DNDF/")
#for item in SOURCE_DATA:
#dir = os.listdir(SOURCE_DATA)
dir = Path.iterdir(SOURCE_DATA)

source_dirs = [d for d in SOURCE_DATA.rglob("") if d.name == "processed"]
print("Source directories:", *map(str, source_dirs), sep="\n")

#Create destination directory
folder_count=0
for folder in SOURCE_DATA.rglob('**/processed/'):
    #print(folder)
    #print(folder.parent)
    out_dir=folder.parent / "probabilities"
    Path.mkdir(out_dir, parents=False, exist_ok=True)
    folder_count=folder_count+1
#for file in SOURCE_DATA.rglob('**/processed/*.tif'):
   # print(file)
#Create Outdir list
out_dirs = [d for d in SOURCE_DATA.rglob("") if d.name == "probabilities"]
print("Target directories:", *map(str, source_dirs), sep="\n") #Is there a chance that in and out get mixed up?

print (len(source_dirs))
for i in range (0, len(source_dirs)):
    if source_dirs[i].parent==out_dirs[i].parent:
        print ("dirs are matched")
print (*map(str,out_dirs), sep="\n")

def genSeg():
    common_args = [
        str(ILASTIK_EXECUTABLE),
        "--headless",
        "--readonly=1",
        "--input_axes=cyx",
        "--export_source=Simple Segmentation Stage 2",
        "--output_format=tiff",
        "--export_dtype=uint8",
        #"--export_drange=(0,255)",
        "--project="+str(PROJECT_FILE),
    ]
    source_files
    for file in source_files:
        #print (folder  / "*.tiff")
        args = [
            *common_args, #I think this concatenates the common-args list to this one. 
            "--output_filename_format="+str(file.parent.parent /  "probabilities/{nickname}_prediction.tiff"),
            #"--raw_data",
            #folder.rglob('**/processed/*.tif')
            #folder / "*.tif"
            file
        ]
        subprocess.run(map(str,args), check=True)
        print (*map(str,args), sep="\n")
        print("___________________")
        
genSeg()

I now need to define more files that are linked to the source file that need to be passed as args to subprocess.run().

An example of associated files is

#define probability files
prob_file_paths=[]
for file in source_files:
    prob_file=file.name[:-4]+"_Probabilities.h5"
    #print(prob_file)
    prob_file_path=file.parent/ prob_file
    if prob_file_path.exists:
        #print (prob_file_path)
        prob_file_paths.append(prob_file_path)
    else:
        print ("file not found")
print (*map(str, prob_file_paths), sep="\n")

So it is easy to create lists that contain the information but they are not linked.

How would one create a list of linked information?

Hope this made sense,

James

1 Answers1

0

Yes, (data)classes could be a solution. They can serve as what is generically called a "record", i.e. some variables packed together.

In Python, dataclasses are a simpler way to declare regular classes : it will generate the corresponding __init__ code for you.

@dataclass
class ClassificationTask:
    source_directory_path: Path
    target_directory_path: Path
    source_filename: str
    associated_filename: str

Which can be displayed with just print(classification_task) (because dataclasses also auto-define a neat __str__ method.


Some remarks about your code (that you did not ask for) :

  • Avoid calling a variable dir because it shadows the built-in function dir (but it is rarely used so it is not a real problem).
  • I did not know of print(..., sep="\n"), thanks ! :)
  • In
    args = [
        *common_args, #I think this concatenates the common-args list to this one. 
    ...
    ]
    
    It is called the splat operator and it will unpack your common_args into the args list. See this answer. Just an example :
    >>> common_args = ["a", "b", "c"]
    >>> [common_args, "d", "e"]
    [['a', 'b', 'c'], 'd', 'e']
    >>> [*common_args, "d", "e"]
    ['a', 'b', 'c', 'd', 'e']
    
    The difference is whether the list itself, or its values, are used.
Lenormju
  • 4,078
  • 2
  • 8
  • 22
  • Many Thanks for the answer and the informative additional information @Lenormju. I would be interested to know if there is a better way to achieve this. – James Burchfield Sep 13 '21 at 23:43
  • @JamesBurchfield It is a good way to do it, define "better" if you want something specific. But for your problem, dataclasses would do just fine ! – Lenormju Sep 15 '21 at 10:01