0

I have a folder structure listed like the following

MA/valid/wrist/pa/positive/image2.png

Basically, for each wrist there are multiple pa, and for each pa there is a positive or negative study, and for each study there are up to 3 images in png format.

I have written a code below, but it only goes down to the pa level, it does not load my image files. Any help with loading my image files will be appreciated.

def load(Pic_Dir,Imsize):
   
    data = []
    dirs = next(os.walk(Pic_Dir))[1]

    for dir_name in dirs:

        files = next(os.walk(os.path.join(Pic_Dir, dir_name)))[2]
        print("load [", len(files), "] files from [",dir_name,"] " )
        for i in range(len(files)):
          image_name = files[i]
          image_path = os.path.join(Pic_Dir, dir_name, image_name)
          
          label = dir_name

          img = cv2.imread(image_path)
          img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
          img = cv2.resize(img, (Imsize, Imsize))

          data.append([np.array(img), label])

    return 

The function is called with the following line:

data=load("/Users/bond/MA/train/XR_WRIST",244)
khelwood
  • 55,782
  • 14
  • 81
  • 108
  • give the following a try: [Recursive Sub-directory Traversal](https://stackoverflow.com/questions/50714469/recursively-iterate-through-all-subdirectories-using-pathlib) – Akshay Solunke Aug 18 '20 at 07:56
  • @qiriro thank you for your response. pa stands for patients. I have a thousand patients and some have positive studies and some have negative studies. The positive or negative studies then have multiple images. – Adeyemi Adejuwon Aug 18 '20 at 17:23

1 Answers1

1

I am not sure if I understood your question very well. However, if you need to walk through the directory and process all image file in its sub-directory, I would suggest you write something like this:

def load(root_director,Imsize):
    import os
    #TODO:You need to figure out how to get a list of this pa. 
    #     Your question is not clear on how to get here
    pas =get_list_of_pa()
    cases =["positive", "negative"]
    for pa in pas:
        for case in cases:
            in_dir = os.path.join(root_directory, pa, case)
            all_images = [f for f in os.listdir(in_dir) if f.endswith('.png')]
            for image in all_images:
                #Do your processing here
                pass

Basically, as you said, if you have many pa (what is pa?) you first need to get a list of all pas and loop through each one to access the list of cases=["positive", "negative]. This is not optimum. There are better ways to go through a directory, e.g., using the path.rglob or os.walk method you used before. Please note that I am writing this code off the top of my head and did not test it in any way. As a side note, IMHO, I would refactor your method and call it as follows

def load (director, pa, case):
    # Get images for the pa and case
    # Process the images

This would potentially reduce its complexity. In fact, to respect the single-responsibility principle (SRP), you probably need to refactor the method much further. For example, you need a method to get all the images of a directory

def get_images (director):
  pass

Which returns the list of images (in this case, only .png files). Then, you would need another method that processes the image

def process_image (Imsize):
    pass

I hope this helps!