I was having this issue. Importing images from google drive to google colab from a directory seemed to import images randomly.
So I 1st checked with this code to confirm my theory.
inside = os.listdir('/content/gdrive/MyDrive/files/')
for i in range(20):
print(inside[i])
Which gave the output:
15588_KateOMara_32_f.jpg
15658_KatharineRoss_68_f.jpg
15741_MaryTamm_40_f.jpg
15661_KatharineRoss_72_f.jpg
15621_KateOMara_70_f.jpg
15646_KatharineRoss_46_f.jpg
15851_StВphaneAudran_22_f.jpg
15810_SarahDouglas_61_f.jpg
15486_JeanetteMacDonald_46_f.jpg
15831_StefaniePowers_56_f.jpg
15670_KathrynGrayson_26_f.jpg
15539_JulieBishop_36_f.jpg
15696_KathrynGrayson_75_f.jpg
15738_MaryTamm_33_f.jpg
15853_StВphaneAudran_24_f.jpg
15665_KathrynGrayson_21_f.jpg
15815_StefaniePowers_24_f.jpg
15748_MaryTamm_51_f.jpg
15759_PamelaSueMartin_26_f.jpg
15799_SarahDouglas_43_f.jpg
I was using
os.listdir(self.directory)
which returns the list of all files and directories in the specified path. So I just used
sorted()
function to sort the list and this solved the issue.
sorted_dir = sorted(os.listdir('/content/gdrive/MyDrive/files/'))
for i in range(20):
print(sorted_dir[i])
Output:
0_MariaCallas_35_f.jpg
10000_GlennClose_62_f.jpg
10001_GoldieHawn_23_f.jpg
10002_GoldieHawn_24_f.jpg
10003_GoldieHawn_24_f.jpg
10004_GoldieHawn_27_f.jpg
10005_GoldieHawn_28_f.jpg
10006_GoldieHawn_29_f.jpg
10007_GoldieHawn_30_f.jpg
10008_GoldieHawn_31_f.jpg
10009_GoldieHawn_35_f.jpg
1000_StephenHawking_1_m.jpg
10010_GoldieHawn_35_f.jpg
10011_GoldieHawn_37_f.jpg
10012_GoldieHawn_39_f.jpg
10013_GoldieHawn_44_f.jpg
10014_GoldieHawn_45_f.jpg
10015_GoldieHawn_45_f.jpg
10016_GoldieHawn_50_f.jpg
10017_GoldieHawn_51_f.jpg
Before:
for i, file in enumerate(os.listdir(self.directory)):
file_labels = parse('{}_{person}_{age}_{gender}.jpg', file)
After:
for i, file in enumerate(sorted(os.listdir(self.directory))):
file_labels = parse('{}_{person}_{age}_{gender}.jpg', file)