I am looking to iterate through a set of JPG files in a folder and tie those JPG files together into a PDF. Each JPG represents an ordered page in the PDF, and therefore in order to correctly tie these JPGs into a PDF, they must be sorted appropriately when I iterate through the folder.
My JPG file structure in this folder is like the following:
filename_1.jpg
filename_2.jpg
filename_3.jpg
filename_4.jpg
filename_5.jpg
filename_6.jpg
filename_7.jpg
filename_8.jpg
filename_9.jpg
filename_10.jpg
filename_11.jpg
filename_12.jpg
filename_13.jpg
filename_14.jpg
filename_15.jpg
Where the number at the end of the filename represents the page number in the PDF.
When I do the following to test whether the files are sorted in the correct order:
for file in sorted(os.listdir(folder_path)):
print(file)
I get the following output when the sorted
function sorts the file structure:
filename_1.jpg
filename_10.jpg
filename_11.jpg
filename_12.jpg
filename_13.jpg
filename_14.jpg
filename_15.jpg
filename_2.jpg
filename_3.jpg
filename_4.jpg
filename_5.jpg
filename_6.jpg
filename_7.jpg
filename_8.jpg
filename_9.jpg
While this is in correct "sorting" order from an alphanumeric perspective, it is not in correct page order, and therefore the resulting PDF will not be sorted properly. I know if I add a zero before each of the single digit page number files, this would work properly (i.e. filename_01.jpg instead of filename_1.jpg), however I have over 8,000 jpg files across over 600 folders of jpgs, and converting all these single digit page number files in this way is not a straightforward task for me to take on.
Does anyone have a suggestion on how I can get these files to sort appropriately based on the page number at the end of the filename?