0

I have a requirement where the customer will upload a pdf file which contains image/s . I have to read that pdf file, extract the image from them, then save that image into db and in hard disk. But I don't know how to extract the image from a pdf file using Python/Django code. Is there any python library available which reads and extracts images from a pdf file.

Thanks in advance.

sandeep
  • 3,061
  • 11
  • 35
  • 54

1 Answers1

2

I am not sure if you would find a python library for that. BUt if you are okay with an external tool then pdfimages can do the job

http://en.wikipedia.org/wiki/Pdfimages

I used it with subprocess for a project of mine.

lazy functor
  • 2,118
  • 2
  • 15
  • 16
  • Yeah 'pdfimages' is a nice command and it worked for me. But the images are in .ppm format. Can we save it in .jpeg format. Another thing 'pdfimages' is working in my local machine which is a Ubuntu 12.04. But on server its not working which is a lower version of Ubuntu.Do I need to install pdfimages there? – sandeep Aug 09 '13 at 07:48
  • pdfimages is part of the package poppler-utils. For jpeg try -j option (from man page) but it has a caveat only works for DCT images – lazy functor Aug 09 '13 at 08:01