I need to render or fetch all the images from a specific PDF file. How can I achieve this using Ghostscript or ImageMagick ?
-
1Better to use pdfimages. – fmw42 Feb 28 '23 at 17:39
4 Answers
You cannot do it with Ghostscript, but you can do it with Poppler's or XPDF's commandline tools named pdfimages
:
pdfimages -j some.pdf subdir/image-prefix
All the images will now be located in subdir/
named image-prefix-0001.jpg
, image-prefix-0002.jpg
...
The -j
parameter will make the command try to directly extract JPEGs. Failing to create JPEGs, it will create PNMs or PPMs, which you can always convert using ImageMagick:
convert subdir/image-prefix-0033.ppm subdir/image-prefix-0033.jpeg

- 86,724
- 23
- 248
- 345
-
I want to do something similar but I want to completely remove all images from the pdf, I.e the pdf should contain only text/font but no images, can this be done? pls help. – hussainb Dec 19 '13 at 08:19
-
1@codin: Comments are not for discussing an entirely new topic. Please ask a new question, tag it as `[ghostscript]` + `[pdf]` and I'll try to answer it as best as I can. Please also state what's the purpose of your request. Saving on filesize? Remove info contained in the images? Or? – Kurt Pfeifle Dec 19 '13 at 12:27
-
For the second step, [Mogrify](https://imagemagick.org/script/mogrify.php) seems more appropriate : `magick mogrify -format jpg *.ppm` – Dorian Grv Apr 14 '20 at 08:24
-
why don't you use convert directly? `convert some.pdf image-prefix.jpg` – patxiska Apr 22 '22 at 20:30
-
1@patxiska Why don't you try both variants to see the difference yourself?? (Your `convert` transforms PDF pages into full-page images which even include the text parts of the pages; `pdfimages` *extracts* images embedded inside PDF pages without the text parts.) – Kurt Pfeifle Apr 22 '22 at 23:05
You certainly can't do it in Ghostscript, without coding yourself a Ghostscript device.
I doubt you can do it with ImageMagick either.
Have you looked at PDFtk ?
If you are on Windows then a quick Google turns up:
http://www.somepdf.com/some-pdf-image-extract.html
And on Linux:
https://askubuntu.com/questions/150100/extracting-images-from-a-pdf
-
the reason i want to use GS or image-magic is that they are using script that i can run through java , do you recommend any PDF tool that use command line commands or scripts to achieve this ? – mmoghrabi Jun 12 '13 at 12:28
-
ImageMagick also offers the option to convert PDFs to images using the following syntax:
convert /path/to/file.pdf /path/to/output/file.png
Apart from the "regular" conversion it offers many useful options like:
- Only extract a couple of pages (by adding
[0-n]
after the PDF filename, e.g.,convert "file.pdf[0-1]" /path/to/output/file.png
) - Using the crop-box defined in the PDF document
-define pdf:use-cropbox=true
- Changing the output density DPI
-density 300
- Scaling images to a certain size, e.g., max 2000x2000px with
-resize 2000x2000>
- Setting background color for PDFs with transparency with
-background white
- Removing alpha channel
-alpha remove -alpha off
and many more.

- 9,187
- 3
- 68
- 108
example extracting 1 page:
gs -q -dBATCH -dNOPAUSE -sDEVICE=pnggray -d300 -dFirstPage=1 -dLastPage=1 -sOutputFile=1.tiff in.pdf

- 479
- 1
- 5
- 8
-
4This doesn't extract the original images. It renders an image based on the appearance of the page. For example, if you have text overlaid on the image, you'll get that in your rendered image too. – mlissner Sep 26 '16 at 22:31