Convert multiple multipage PDFs to JPGs in subfolders

Question

Simple use case:

A folder with many (mostly multipage) PDF files.
A script should convert each PDF page to JPG and store it in a subfolder named after the PDF filename. (e.g. #33.pdf to folder #33)
Single JPG files should also have this filename plus a counter mirroring the sequential page number in the PDF. (e.g. #33_001.jpg)

I found a bounch of related questions, but nothing that quite does what I want, e.g.

How do I convert multiple PDFs into images from the same folder in Python?

A python script would work fine, but also any other way to do this in Win10 (imagemagick, e.g.) is cool with me.

You can convert multipage pdfs to separate jpg files by `convert -density XXX image.pdf -set filename:fn "%[filename:fn]_%3d.jpg"`. That will produce image_001.jpg image002.jpg ... etc. The filename part is the way Imagemagick automatically sets the output name to be the same as the input name. The %3d sets the page counter part — fmw42, Nov 23 '22 at 17:08
But how to batch process multiple files and batch create corresponding subfolders? — Chris, Nov 23 '22 at 21:01
Write a .bat script FOR loop over each image. Use %[filename] for the directory rather than the file name. "%[filename:fn]/%[filename:fn_%3d.jpg" should name the folder and the file with the input name. — fmw42, Nov 23 '22 at 21:28
And convert is able to create a folder if inexistent? Do you maybe have a link to an example for such .bat loop? Apologies for my ignorance. — Chris, Nov 23 '22 at 21:32
Imagemagick will not create new directories. They will have to exist already. Your .bat script can create the directories and then call Imagemagick. Sorry, I am not a Windows user and do not script .bat. — fmw42, Nov 24 '22 at 01:05

K J · Accepted Answer · 2022-11-27T22:14:54.807

Your comment requests how a batch can do as required, for simplicity the following only processes a single file so Python will need to loop through a folder and call with each name in turn. That could be done by adding a "for loop" in batch but first see where problems arise, as many of my single test files threw differing errors.

I have tried to cover several fails in this batch file, in my system, but there can still be issues such as a file that has no valid fonts to display

For most recent poppler windows 64bit utils see https://github.com/oschwartz10612/poppler-windows/releases/ for 32 bit use xpdf latest version http://www.xpdfreader.com/download.html but that has direct pdftopng.exe so needs a few edits.

pdf2dir.bat

@echo off
set "bin=C:\Apps\PDF\poppler\22.11.0\Library\bin"
set "res=200"
REM for type use one of 3 i.e. png jpeg jpegcmyk (PNG is best for documents)
set "type=png"

if exist "%~dpn1\*.%type%" echo: &echo Files already exist in "%~dpn1" skipping overwrite&goto pause
if not exist "%~dpn1.pdf" echo: &echo "%~dpn0" File "%~dpn1.pdf" not found&goto pause

if not exist "%~dpn1\*.*" md "%~dpn1"

REM following line deliberately opens folder to show progress delete it or prefix with REM for blind running
explorer "%~dpn1"

"%bin%\pdftoppm.exe" -%type% -r %res% "%~dpn1.pdf" "%~dpn1\%~n1"
if %errorlevel%==1 echo: &echo Build of %type% files failed&goto pause
if not exist "%~dpn1\*.%type%" echo: &echo Build of %type% files failed&goto pause

:pause
echo:
pause
:end

It requires Poppler binaries path to pdftoppm be correctly set in the second line
It can be placed wherever desired i.e. work folder or desktop
It allows for drag and drop of one pdf on top will (should work) without need to run in console
Can be run in a command console and place a space character after, you can drag and drop a single filename but any spaces in name must be "double quoted"
can be run from any shell or OS command as "path to/batchfile.bat" "c:\path to\file.pdf"

Convert multiple multipage PDFs to JPGs in subfolders

1 Answers1