4

I am new to PDF document processing with PHP. I have two questions:

  1. Image search in PDF: Is it possible to search PDF pages has images or not? If so, how?

  2. Check image type: If it has images in it then how can I check the image type (I mean whether it vector or some other types)?

Can anyone suggest some ideas on how to do this?

Amal Murali
  • 75,622
  • 18
  • 128
  • 150
user2943773
  • 254
  • 1
  • 4
  • 10
  • 1
    One logic will be to extract image from pdf then easily check image type with php –  Dec 26 '13 at 10:16
  • Here is a nice article http://stackoverflow.com/questions/430707/how-can-i-extract-images-from-a-pdf-file –  Dec 26 '13 at 10:24
  • @Muhammet thank you for your response...But Before grab the image from PDF using pdfimages then we need to check whether this page has an image and it is vector image or not? that's it. Because if the image type is vector means then i need to convert it to tiff other formats to be jpeg. – user2943773 Dec 26 '13 at 10:28
  • Related: https://stackoverflow.com/questions/430707/how-can-i-extract-images-from-a-pdf-file – StayOnTarget Mar 06 '19 at 12:45
  • This is probably a duplicate of: https://stackoverflow.com/questions/8243058/programatically-check-if-the-file-is-raster-or-vector-pdf-eps-ai – StayOnTarget Mar 06 '19 at 12:45

2 Answers2

1

There is no way to know image type, without extracting it from pdf.
You can extract images from pdf and then check image type easily with php.
Here : How can I extract images from a PDF file?

Community
  • 1
  • 1
1

On #2: in general, only bitmap images can reliably be extracted from a PDF. Not always, though: not every bitmap is "an image". Consider, for example, a bitmapped font or that nasty Word sub-function that inserts Symbol characters as 8x8 images. And sometimes, bitmap images are used as fills for vector objects.

Acrobat Pro provides (provided?) a command "Extract all images" that asks for a minimum size to prevent lots of irrelevant little files being created.

Vector images can only be extracted under very specific circumstances. In most cases, the vector data is embedded on a page together with "regular" page content, and so there is no real difference between lines that form an image and lines that draw an underline under some plain text.

Jongware
  • 22,200
  • 8
  • 54
  • 100
  • I mostly agree with you, but what are the Vector images you are talking about? – Hugo Moreno Dec 28 '13 at 17:43
  • @HugoMoreno: I was thinking of XObjects, which (I think) are complete objects, ready to be "pasted in". Acrobat X Pro's "Create Inventory" contain these: the data images all have the same XObject magnifying glass drawing. (You still need to translate PDF commands to your vector language of choice.) – Jongware Dec 31 '13 at 00:23