0

Does anyone know of a way i can extract all jpg images from a pdf file? I am currently using Acrobat and i have a file that contains about 1500 photos that i need to extract but doing them one at a time would be much too time consuming. Any ideas?

Thanks.

Diego Magalhães
  • 725
  • 1
  • 10
  • 32
  • Duplicate question, answered here: [http://stackoverflow.com/questions/430707/how-can-i-extract-images-from-a-pdf-file](http://stackoverflow.com/questions/430707/how-can-i-extract-images-from-a-pdf-file) – Jeff Bauer Jan 19 '09 at 03:12

4 Answers4

1

just doing a little search i found this, i hope it helps... i cant think of any reason there'd be 1500 images in a pdf.

http://pdf-image-extraction-wizard.lastdownload.com/

John Boker
  • 82,559
  • 17
  • 97
  • 130
1

There are free utilities that can help you do this. For example, a quick Google search turned up this one.

David Crow
  • 16,077
  • 8
  • 42
  • 34
0

On a Mac try the app FileJuicer - this normally works really well at extracting images from PDFs

Jeremy Young
  • 184
  • 1
  • 7
0

Coding answer (requires tesseract (free software)). I'm not sure which of the packages I actually used for that bit of code, some packages are there for other functions in the same code block.

from PIL import Image
import pytesseract
import cv2
import os
import subprocess

#Strip images and put them in the relevant directory
def image_exporter(pdf_path, output_dir):
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    cmd = ['pdfimages', '-all', pdf_path,
           '{}/prefix'.format(output_dir)]
    subprocess.call(cmd)
    print('Images extracted:')
    print(os.listdir(output_dir))
Evan Mata
  • 500
  • 1
  • 6
  • 19