0

Simple use case:

  • A folder with many (mostly multipage) PDF files.
  • A script should convert each PDF page to JPG and store it in a subfolder named after the PDF filename. (e.g. #33.pdf to folder #33)
  • Single JPG files should also have this filename plus a counter mirroring the sequential page number in the PDF. (e.g. #33_001.jpg)

I found a bounch of related questions, but nothing that quite does what I want, e.g.

How do I convert multiple PDFs into images from the same folder in Python?

A python script would work fine, but also any other way to do this in Win10 (imagemagick, e.g.) is cool with me.

Christoph Rackwitz
  • 11,317
  • 4
  • 27
  • 36
Chris
  • 15
  • 3
  • You can convert multipage pdfs to separate jpg files by `convert -density XXX image.pdf -set filename:fn "%[filename:fn]_%3d.jpg"`. That will produce image_001.jpg image002.jpg ... etc. The filename part is the way Imagemagick automatically sets the output name to be the same as the input name. The %3d sets the page counter part – fmw42 Nov 23 '22 at 17:08
  • But how to batch process multiple files and batch create corresponding subfolders? – Chris Nov 23 '22 at 21:01
  • Write a .bat script FOR loop over each image. Use %[filename] for the directory rather than the file name. "%[filename:fn]/%[filename:fn_%3d.jpg" should name the folder and the file with the input name. – fmw42 Nov 23 '22 at 21:28
  • And convert is able to create a folder if inexistent? Do you maybe have a link to an example for such .bat loop? Apologies for my ignorance. – Chris Nov 23 '22 at 21:32
  • Imagemagick will not create new directories. They will have to exist already. Your .bat script can create the directories and then call Imagemagick. Sorry, I am not a Windows user and do not script .bat. – fmw42 Nov 24 '22 at 01:05

1 Answers1

0

Your comment requests how a batch can do as required, for simplicity the following only processes a single file so Python will need to loop through a folder and call with each name in turn. That could be done by adding a "for loop" in batch but first see where problems arise, as many of my single test files threw differing errors.

I have tried to cover several fails in this batch file, in my system, but there can still be issues such as a file that has no valid fonts to display enter image description here

For most recent poppler windows 64bit utils see https://github.com/oschwartz10612/poppler-windows/releases/ for 32 bit use xpdf latest version http://www.xpdfreader.com/download.html but that has direct pdftopng.exe so needs a few edits.

pdf2dir.bat

@echo off
set "bin=C:\Apps\PDF\poppler\22.11.0\Library\bin"
set "res=200"
REM for type use one of 3 i.e. png jpeg jpegcmyk (PNG is best for documents)
set "type=png"

if exist "%~dpn1\*.%type%" echo: &echo Files already exist in "%~dpn1" skipping overwrite&goto pause
if not exist "%~dpn1.pdf" echo: &echo "%~dpn0" File "%~dpn1.pdf" not found&goto pause

if not exist "%~dpn1\*.*" md "%~dpn1"

REM following line deliberately opens folder to show progress delete it or prefix with REM for blind running
explorer "%~dpn1"

"%bin%\pdftoppm.exe" -%type% -r %res% "%~dpn1.pdf" "%~dpn1\%~n1"
if %errorlevel%==1 echo: &echo Build of %type% files failed&goto pause
if not exist "%~dpn1\*.%type%" echo: &echo Build of %type% files failed&goto pause

:pause
echo:
pause
:end

  • It requires Poppler binaries path to pdftoppm be correctly set in the second line
  • It can be placed wherever desired i.e. work folder or desktop
  • It allows for drag and drop of one pdf on top will (should work) without need to run in console
  • Can be run in a command console and place a space character after, you can drag and drop a single filename but any spaces in name must be "double quoted"
  • can be run from any shell or OS command as "path to/batchfile.bat" "c:\path to\file.pdf"
K J
  • 8,045
  • 3
  • 14
  • 36