-2

If the title doesn't explain well enough I will detail here: I have a PDF file created with an android app. It allows you to attach files to a pdf. This is a standard feature provided by Adobe. There are libraries out there to do almost anything with a PDF file using PHP. Many even support adding these attachments. I need to extract and save the image(.jpg) embedded inside the PDF to the server. Saving the PDF file isn't really my concern. Just figuring out how to touch the files inside the PDF.

Any help is awesome and I can provide an example PDF if requested. Email me at AriderM@gmail.com Thanks!

EDIT: I'm looking to gain access to these files in PHP.
http://blogs.adobe.com/insidepdf/2010/11/pdf-file-attachments.html

EDIT: I do mean attached to the file, similiar as an email functions where It's there, but not embedded.

  • 1
    What have you tried so far? What code do you have? Perhaps you should read on how to ask a question here. http://stackoverflow.com/questions/how-to-ask – skrilled Mar 12 '14 at 00:35
  • Basically the flow is this: 1. read attached files from pdf 2. save files to server 3. create PDF(done with fpdf) 4. place page from original pdf into new pdf file(done with fpdi) 5. place image on top of new pdf(done with fpdf) 6. save pdf and email to mailing list(done with PHPMailer) – user3238095 Mar 12 '14 at 00:38
  • So look into the question a little deeper. I can edit. I need to access these http://blogs.adobe.com/insidepdf/2010/11/pdf-file-attachments.html , not what you pointed me to – user3238095 Mar 12 '14 at 00:43
  • @skrilled: Your second comment above comes off as sarcastic and rude. That aside, the link you provided is to a q/a about editing PDF files, not extracting images from them. – bwright Mar 12 '14 at 02:04
  • It comes across as sarcastic and rude because it is. The OP didn't read on how to ask a question, and simply continued on a tirade about what he demands of his application. People who don't use google before they ask a question don't deserve anyone's time. I suppose you expect people to hold their hand and teach them how to use a search engine though? Get real sir. This site is for teaching future programmers, not doing free work for people. – skrilled Mar 12 '14 at 21:36
  • I've done many extensive searches. And everything comes up with how to attach a pdf to an email. Or how to edit a pdf with php. Both of which I've already completed. And if you read the question thoroughly and read my response to your comment, as well as the link I included, you would see that this is something that hasn't been done quite yet. Or I haven't come across any situations similar. – user3238095 Mar 13 '14 at 20:25

1 Answers1

0

As far as a method using PHP, there have been numerous questions posted to StackOverflow about this. The answers seem to be the same and I have been unable to find any alternative methods that don't require something beyond PHP.

See also: How can I extract images from a PDF file?

To this question, there are several alternative ways of extracting the images. I am assuming you want to do this in an automated fashion, so, pdfimages (http://linuxcommand.org/man_pages/pdfimages1.html) would be a good bet.

On a Windows server, you could use Poppler, a PDF library. Binaries for it can be found here: http://blog.alivate.com.au/poppler-windows/

For anyone else interested, Poppler is also available for Linux: http://cgit.freedesktop.org/poppler/poppler/tree/utils

Community
  • 1
  • 1
bwright
  • 347
  • 1
  • 15
  • 1
    This method assumes the image is an inline object where as the file attachments I'm referring to can contain any file. The link I added to the question describes the feature and briefly covers how the object is stored. I may need to simply load the PDF as a string, regex the object streams to identify attached files and figure out from there how to use the object stream to recreate the attached file. – user3238095 Mar 12 '14 at 16:05
  • Ah, I seem to have missed your second edit to the question. Using regular expressions to extract the image data may be the best option, and is one I hadn't considered prior to your above comment. I'm sorry I can't offer a better solution. I know from personal experience how frustrating PDFs can be to pull apart or convert to something else. – bwright Mar 13 '14 at 19:43
  • Reading into how PDF's store attachments according to Adobe's standards it's likely the compression method being used is Flate. I can't seem to find more than a few points in a pdf file referencing the original attachment. I've opened both the pdf and the .jpg, in this case, with text editors and see no similarities likely due to the compression. Moving forward I will look into using flate compression on the files stored, comparing to the raw data in the pdf, and hopefully use the similarities to come up with a regex that can pull what I need. – user3238095 Mar 13 '14 at 20:28