0

I want to load file names into an array to load images from their paths. This job is done by the solution provided here. My code is something like this

fileNames = glob.glob(os.path.join(directory, "*.jpg"))

My filenames are something similar to this pattern

{videoNo}_{frameNo}_{patchNo).jpg

For example their names are like this

1_1_1.jpg
1_1_2.jpg
1_1_3.jpg
.
.
.
10_1_1.jpg
10_1_2.jpg

When I load filenames into fileNames array, they are like this

10_1_1.jpg
10_1_2.jpg
.
.
.
1_1_1.jpg
1_1_2.jpg
1_1_3.jpg

As far as I know this is because the asci code for _ is bigger than 0 and because of that the list of names is not sorted! I must work with the sorted list. Can anyone give me a hand here?


EDIT
Please notice that the sorted of these file names

 ["1_1_1.jpg", "10_1_3.jpg", "1_1_2.jpg", "10_1_2.jpg", "1_1_3.jpg", "1_20_1", "1_2_1", "1_14_1"]

is similar to this sorted list

 ["1_1_1.jpg", "1_1_2.jpg", "1_1_3.jpg", "1_2_1.jpg", "1_14_1", "1_20_1", "10_1_2.jpg", "10_1_3"]
Shahroozevsky
  • 343
  • 4
  • 17

2 Answers2

3

The sorted builtin and list.sort method both take a key parameter that specifies how to do the sorting. If you want to sort by the numbers in the name (i.e. videoNo, then frameNo, then patchNo) you can split each name into these numbers:

fileNames = sorted(
    glob.glob(os.path.join(directory, "*.jpg")),
    key=lambda item: [
        int(part) for part in os.path.splitext(item)[0].split('_')
    ],
)

The splitting strips off the .jpg extension, then cuts the name on each _. Conversion to int is needed because strings use lexicographic sorting, e.g. "2" > "10".

MisterMiyagi
  • 44,374
  • 10
  • 104
  • 119
2

You could use a regular expression to extract the numbers from the file names and sort by those:

>>> import re
>>> files = ["10_1_3.jpg", "1_10_2.jpg", "3_1_1.jpg", "30_1_2.jpg"]
>>> sorted(files, key=lambda f: tuple(map(int, re.findall(r"\d+", f))))
['1_10_2.jpg', '3_1_1.jpg', '10_1_3.jpg', '30_1_2.jpg']
tobias_k
  • 81,265
  • 12
  • 120
  • 179