Photo extraction from pdf file

Question

Does anyone know of a way i can extract all jpg images from a pdf file? I am currently using Acrobat and i have a file that contains about 1500 photos that i need to extract but doing them one at a time would be much too time consuming. Any ideas?

Thanks.

Duplicate question, answered here: [http://stackoverflow.com/questions/430707/how-can-i-extract-images-from-a-pdf-file](http://stackoverflow.com/questions/430707/how-can-i-extract-images-from-a-pdf-file) — Jeff Bauer, Jan 19 '09 at 03:12

score 1 · Accepted Answer · answered Jan 19 '09 at 03:08

1

just doing a little search i found this, i hope it helps... i cant think of any reason there'd be 1500 images in a pdf.

http://pdf-image-extraction-wizard.lastdownload.com/

answered Jan 19 '09 at 03:08

John Boker

82,559
17
97
130

score 1 · Answer 2 · answered Jan 19 '09 at 03:09

1

There are free utilities that can help you do this. For example, a quick Google search turned up this one.

answered Jan 19 '09 at 03:09

David Crow

16,077
8
42
34

score 0 · Answer 3 · answered Feb 23 '18 at 16:13

0

On a Mac try the app FileJuicer - this normally works really well at extracting images from PDFs

answered Feb 23 '18 at 16:13

Jeremy Young

184
1
7

score 0 · Answer 4 · answered Mar 22 '19 at 17:07

Coding answer (requires tesseract (free software)). I'm not sure which of the packages I actually used for that bit of code, some packages are there for other functions in the same code block.

from PIL import Image
import pytesseract
import cv2
import os
import subprocess

#Strip images and put them in the relevant directory
def image_exporter(pdf_path, output_dir):
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    cmd = ['pdfimages', '-all', pdf_path,
           '{}/prefix'.format(output_dir)]
    subprocess.call(cmd)
    print('Images extracted:')
    print(os.listdir(output_dir))

Photo extraction from pdf file

4 Answers4