5

I am trying to use Google Colab for Image Segmentation process using U-net. I can read the image datasets from Drive to Colab and save in an array. FYI: I have a folder in Google Drive in my with all Training Data containing 2 sub-folders (Image and Mask respectively).

Now after reading and resizing the images and mask when I am checking the images using 'plt.show', I noticed that there is a discrepancy in the order of image numbers. For example when I am randomly picking the 10th image , that image does not match with the 10th image in the google drive. And to make it worse, I get a completely different image for my Mask which makes my image and mask different (main issue!!).

Has anyone faced any similar situation? Any idea how can I get around with this problem?

Rowzat
  • 91
  • 1
  • 3
  • How do you read your data? You may try to use `glob.glob` and sort data as you wish - check [How is Pythons glob.glob ordered?](https://stackoverflow.com/questions/6773584/how-is-pythons-glob-glob-ordered) – ans Mar 18 '21 at 08:44

1 Answers1

5

I was having this issue. Importing images from google drive to google colab from a directory seemed to import images randomly.

So I 1st checked with this code to confirm my theory.

inside = os.listdir('/content/gdrive/MyDrive/files/')
for i in range(20):
    print(inside[i])

Which gave the output:

15588_KateOMara_32_f.jpg
15658_KatharineRoss_68_f.jpg
15741_MaryTamm_40_f.jpg
15661_KatharineRoss_72_f.jpg
15621_KateOMara_70_f.jpg
15646_KatharineRoss_46_f.jpg
15851_StВphaneAudran_22_f.jpg
15810_SarahDouglas_61_f.jpg
15486_JeanetteMacDonald_46_f.jpg
15831_StefaniePowers_56_f.jpg
15670_KathrynGrayson_26_f.jpg
15539_JulieBishop_36_f.jpg
15696_KathrynGrayson_75_f.jpg
15738_MaryTamm_33_f.jpg
15853_StВphaneAudran_24_f.jpg
15665_KathrynGrayson_21_f.jpg
15815_StefaniePowers_24_f.jpg
15748_MaryTamm_51_f.jpg
15759_PamelaSueMartin_26_f.jpg
15799_SarahDouglas_43_f.jpg

I was using

os.listdir(self.directory)

which returns the list of all files and directories in the specified path. So I just used

sorted()

function to sort the list and this solved the issue.

sorted_dir = sorted(os.listdir('/content/gdrive/MyDrive/files/'))
for i in range(20):
    print(sorted_dir[i])

Output:

0_MariaCallas_35_f.jpg
10000_GlennClose_62_f.jpg
10001_GoldieHawn_23_f.jpg
10002_GoldieHawn_24_f.jpg
10003_GoldieHawn_24_f.jpg
10004_GoldieHawn_27_f.jpg
10005_GoldieHawn_28_f.jpg
10006_GoldieHawn_29_f.jpg
10007_GoldieHawn_30_f.jpg
10008_GoldieHawn_31_f.jpg
10009_GoldieHawn_35_f.jpg
1000_StephenHawking_1_m.jpg
10010_GoldieHawn_35_f.jpg
10011_GoldieHawn_37_f.jpg
10012_GoldieHawn_39_f.jpg
10013_GoldieHawn_44_f.jpg
10014_GoldieHawn_45_f.jpg
10015_GoldieHawn_45_f.jpg
10016_GoldieHawn_50_f.jpg
10017_GoldieHawn_51_f.jpg

Before:

for i, file in enumerate(os.listdir(self.directory)):
            file_labels = parse('{}_{person}_{age}_{gender}.jpg', file)

After:

for i, file in enumerate(sorted(os.listdir(self.directory))):
            file_labels = parse('{}_{person}_{age}_{gender}.jpg', file)
nahidosen
  • 84
  • 1
  • 5