How to extract images from PDF using Ghostscript or ImageMagick?

Question

I need to render or fetch all the images from a specific PDF file. How can I achieve this using Ghostscript or ImageMagick ?

score 26 · Answer 1 · answered Jun 24 '13 at 19:35

26

You cannot do it with Ghostscript, but you can do it with Poppler's or XPDF's commandline tools named pdfimages:

pdfimages -j some.pdf subdir/image-prefix

All the images will now be located in subdir/ named image-prefix-0001.jpg, image-prefix-0002.jpg ...

The -j parameter will make the command try to directly extract JPEGs. Failing to create JPEGs, it will create PNMs or PPMs, which you can always convert using ImageMagick:

convert subdir/image-prefix-0033.ppm subdir/image-prefix-0033.jpeg

answered Jun 24 '13 at 19:35

Kurt Pfeifle

86,724
23
248
345

I want to do something similar but I want to completely remove all images from the pdf, I.e the pdf should contain only text/font but no images, can this be done? pls help. – hussainb Dec 19 '13 at 08:19
1

@codin: Comments are not for discussing an entirely new topic. Please ask a new question, tag it as `[ghostscript]` + `[pdf]` and I'll try to answer it as best as I can. Please also state what's the purpose of your request. Saving on filesize? Remove info contained in the images? Or? – Kurt Pfeifle Dec 19 '13 at 12:27
For the second step, [Mogrify](https://imagemagick.org/script/mogrify.php) seems more appropriate : `magick mogrify -format jpg *.ppm` – Dorian Grv Apr 14 '20 at 08:24
why don't you use convert directly? `convert some.pdf image-prefix.jpg` – patxiska Apr 22 '22 at 20:30
1

@patxiska Why don't you try both variants to see the difference yourself?? (Your `convert` transforms PDF pages into full-page images which even include the text parts of the pages; `pdfimages` *extracts* images embedded inside PDF pages without the text parts.) – Kurt Pfeifle Apr 22 '22 at 23:05

score 2 · Answer 2 · edited Apr 13 '17 at 12:22

2

You certainly can't do it in Ghostscript, without coding yourself a Ghostscript device.
I doubt you can do it with ImageMagick either.
Have you looked at PDFtk ?

If you are on Windows then a quick Google turns up:

http://www.somepdf.com/some-pdf-image-extract.html

And on Linux:

https://askubuntu.com/questions/150100/extracting-images-from-a-pdf

edited Apr 13 '17 at 12:22

Community

1
1

answered Jun 12 '13 at 12:23

KenS

30,202
3
34
51

the reason i want to use GS or image-magic is that they are using script that i can run through java , do you recommend any PDF tool that use command line commands or scripts to achieve this ? – mmoghrabi Jun 12 '13 at 12:28
pdftk can extract attachments, but it doesn't seem to extract images. – bonh May 06 '15 at 17:50

score -1 · Answer 3 · answered Feb 28 '23 at 14:07

ImageMagick also offers the option to convert PDFs to images using the following syntax:

convert /path/to/file.pdf /path/to/output/file.png

Apart from the "regular" conversion it offers many useful options like:

Only extract a couple of pages (by adding [0-n] after the PDF filename, e.g., convert "file.pdf[0-1]" /path/to/output/file.png)
Using the crop-box defined in the PDF document -define pdf:use-cropbox=true
Changing the output density DPI -density 300
Scaling images to a certain size, e.g., max 2000x2000px with -resize 2000x2000>
Setting background color for PDFs with transparency with -background white
Removing alpha channel -alpha remove -alpha off

and many more.

score -3 · Answer 4 · answered May 14 '16 at 23:22

-3

example extracting 1 page:

gs -q -dBATCH -dNOPAUSE -sDEVICE=pnggray -d300 -dFirstPage=1 -dLastPage=1 -sOutputFile=1.tiff in.pdf

answered May 14 '16 at 23:22

user2053898

479
1
5
8

4

This doesn't extract the original images. It renders an image based on the appearance of the page. For example, if you have text overlaid on the image, you'll get that in your rendered image too. – mlissner Sep 26 '16 at 22:31

How to extract images from PDF using Ghostscript or ImageMagick?

4 Answers4