in the code i'm converting multiple 1-page PDFs into PNG Format. The converting itself works out well with cv2 but sadly many documents (PDFs) names contain german umlauts (ä,ö,ü) and the PNGs end up having special characters.
Example: After converting the PDF (lösung_122.png) to PNG, it looks like this "lösung_122.png". It should be loesung_122.png.
I would like to replace all these characters (ä,ö,ü) in the document titles with ae, oe, ue.
How can i adjust my code to archieve this? What options do i have? Maybe theres a way to rename the documents (PDFs) before converting them?
from pdf2image import convert_from_path
import os
import cv2
if __name__ == '__main__':
# Init
dir_name = os.getcwd()
path_pdf = dir_name + '/data/doc/October' #Folder containing all documents (PDF)
save_path = dir_name + '/data/blanko/' #Folder with all converted doc (PNG)
# Loop sub Folders:
files = os.listdir(path_pdf)
for pdf_file in files:
# Check if PDF file
if pdf_file[-3:] == 'pdf':
images = convert_from_path(path_pdf + '/' + pdf_file, dpi=300, poppler_path='C:/Develop/poppler-0.68.0_x86/poppler-0.68.0/bin')
# Save Images
images[0].save(save_path + 'tmp.png', 'PNG')
img = cv2.imread(save_path + 'tmp.png')
cv2.imwrite(save_path + pdf_file[:-4] + '.png', img)
Any help appreciated
Regards