1

EDIT: Thanks to a commenter I see that the root of the problem is that os.scandir is sorting arbitrarily. However I don't have the skills to follow through.

So I have very little experience. I have a few dozen root-folders that are each full of 8-600 sub-folders each full of hundreds of images; that's to say, many thousands of images to process. The sub-folders are named normally ordered numbers (1, 2, 3,...598, 599, 600) and the contained images are named after the sub-folder and then in order ex. for one of the first sub-folders is 001-001, 001-002, 001-003,...001-239. With the final img collection in the example root folder being completed with img 0022-264.

Think of the images like pages and the sub-folders as chapters. The pages inside each chapter are in order but the chapters themselves are out of natural order.

(If you're curious why I'm using lossless conversion with a lossy file format it's because the original images were png but truncated so I had to first get a program built to batch convert them to JPEG with high tolerance)

The below program correctly goes through all the sub-folders in an inputted root-folder, correctly orders the img files relative to the sub-folder, however it incorrectly orders the batches of images in the final pdf for example one collection will jump from img 001-045 to 0010-001 to 00100-001 instead of the correct 001-045 to 002-001 in normal order. (This is my first question sorry if the pasted program is not exactly "minimal" but it should be "reproducible".)

import os
import img2pdf
from typing import List, Tuple

def convert_images_to_pdf(image_paths: List[str], pdf_path: str) -> None:
    """
    This function takes a list of image paths and converts them into a single PDF file.

    Args:
        image_paths: A list of file paths for the images to be converted to PDF.
        pdf_path: A file path for the output PDF.

    Returns:
        None. Generates a message after the PDF is generated or an error is raised.
    """
    try:
        with open(pdf_path, 'wb') as pdf_file:
            pdf_file.write(img2pdf.convert(image_paths))
        print(f"PDF successfully generated at {pdf_path}")
    except (ValueError, FileNotFoundError) as error:
        print(f"Error occurred while generating PDF: {error}")


def create_folder_pdf(folder_path: str) -> None:
    """
    This function takes a folder path and converts all JPEG files inside the folder and subfolders to create a single PDF file.

    Args:
        folder_path: A file path for the folder that contains images.

    Returns:
        None. Prints a warning message if no images are found, or an error message if an exception is raised.
    """
    try:
        with os.scandir(folder_path) as folder_iterator:
            
            for item in folder_iterator:
                
                if item.is_dir():
                    
                    subfolder_name = item.name
                    subfolder_path = os.path.join(folder_path, subfolder_name)
                    image_paths = []
                    
                    for file in os.scandir(subfolder_path):
                        
                        if file.name.lower().endswith(('.jpeg', '.jpg')):
                            image_paths.append(file.path)
                            
                    if image_paths:
                        pdf_path = os.path.join(subfolder_path, f'{subfolder_name}.pdf')
                        convert_images_to_pdf(image_paths, pdf_path)
                        
                    else:
                        print(f"No JPEG files found in {subfolder_name}!")
    except OSError as error:
        print(f"Error occurred while opening the folder at {folder_path}: {error}")
        return


if __name__ == "__main__":
    folder_path = input("Enter the path of the folder with JPEG images: ")
    
    if not os.path.exists(folder_path):
        print(f"Path {folder_path} does not exist.")
        
    else:
        create_folder_pdf(folder_path)
Matt Bone
  • 11
  • 2
  • It's worth mentioning that `os.scandir()` returns its results in arbitrary order. Per the [docs](https://docs.python.org/3/library/os.html#os.scandir): `The entries are yielded in arbitrary order,`. If you want a different order, you need to sort it. – Nick ODell Apr 14 '23 at 22:59
  • 1
    Alright so I looked up how to sort it and I'm seeing that I can use a library called nat_sort? I also found another solution ```sorted(os.scandir("folder_path"), key=lambda e: e.name)``` but that just returns "Error occurred while opening the folder at ...: [WinError 3] The system cannot find the path specified: 'folder_path' I don't understand. – Matt Bone Apr 15 '23 at 01:21
  • That'sss pretty close, I could try renaming the sub-folders with an A in front of them all. But I'm not exactly jumping to manually change hundreds of folder names. – Matt Bone Apr 15 '23 at 01:36

0 Answers0