I'm trying to read in some pdfs located in a directory, and outputting images of their pages in a different directory.
(I'm seeking to learn how this code works and I am hoping there's a cleaner way to specify an output directory for my image files.)
What I've done works, but I think it is just bouncing back and forth between my save directory and my pdf directory.
This doesn't feel like a clean approach. Is there a better option, which preserves the existing code and accomplishes what my added lines do?
import os
from pdf2image import convert_from_path
pdf_dir = r"mydirectorypathwithPDFs"
save_dir = 'mydirectorypathforimages'
os.chdir(pdf_dir)
for pdf_file in os.listdir(pdf_dir):
os.chdir(pdf_dir) #I added this, change back to the pdf directory
if pdf_file.endswith(".pdf"):
pages = convert_from_path(pdf_file, 300)
pdf_file = pdf_file[:-4]
for page in pages:
os.chdir(save_dir) #I added this, change to the save directory
page.save("%s-page%d.jpg" % (pdf_file,pages.index(page)), "JPEG")
The code I slightly modified was created by @photek1944 and found here: https://stackoverflow.com/a/53463015/10216912