0

So I have an XML file that has a base64 encoded data string for a pdf file, which just has an image taken from an iPad.

This pdf file can be excessively large, as much as 14MB with dimensions of 57"x38".

These images are taken from an iPad through a DocuSign session, thus I have no way at the moment of controlling their size or format before they get to my php listener script.

However, my script cannot work with such large files as my CRM's API file size max is 10MB, and I need a way of reducing the file size before I can upload it through my CRM's API.

Now if it was just a jpg, it would be ok as there are plenty of ways to reduce file size in PHP, but it is a PDF. I have found plenty of PHP extensions for making PDFs, but I haven't found any for reading a PDF and extracting an image from it.

So is there a way to extract the image from the PDF through PHP, or perhaps compress the pdf file?

UPDATE

I didn't think about the possibility of converting a pdf into a jpg, which apparently is easier to do with imagick. Having my server admin install it and I will see if I can make it work with my script.

UPDATE 2

So I was able to get imagick working and locally I am able to convert pdf files into jpg, and reduce file size dramatically.

However, I am running into an issue using it with my application. I get the following error from my CRM's API:

Failed to parse XML-RPC request: Invalid byte 1 of 1-byte UTF-8 sequence.

So the process is the following:

  1. XML file has a base64 encoded data stream of the pdf file.
  2. I decode this data
  3. I then convert with imagick and reduce file size
  4. I base64 encode and prep for upload

CODE

        $imageBlob = base64_decode((string)$pdf->PDFBytes);
        $imagick.$x = new Imagick();
        $imagick.$x->readImageBlob($imageBlob);
        $imagick.$x->setImageFormat('jpeg');
        $imagick.$x->setImageCompressionQuality(60);
        $imagick.$x->adaptiveResizeImage(1024,768,true);
        $imageBlob = $imagick.$x->getImageBlob();
        $PDFdata[] = base64_encode($imageBlob);

I can test the date by using the proper header and I can see the new jpeg fine, so I assume the data is properly formatted.

What I am missing?

Community
  • 1
  • 1
David Avellan
  • 385
  • 4
  • 24
  • What you send to CRM? Jpg, pdf, base64 string? – bdn02 Jan 14 '16 at 20:26
  • @bdn02 - I have been sending base64 encoded string. It can be pdf, jpg, gif, or any other file as long as it is under 10MB. – David Avellan Jan 14 '16 at 20:57
  • Is optimization an option? - i.e. resample images reducing resolution and file sizes with some impact on quality. – dwarring Jan 14 '16 at 22:09
  • @dwarring - yes that is what I am hoping to do. But my issue is how do I get the images out of the pdf? Unfortunately I cannot change how I get the file as it comes from DocuSign and is always a pdf. – David Avellan Jan 14 '16 at 22:22
  • 1
    @David If you can use the shell, and have any of ghostscript, xpdf or image-magick available, this thread might be useful - http://stackoverflow.com/questions/10450120/optimize-pdf-files-with-ghostscript-or-other – dwarring Jan 14 '16 at 22:51
  • @dwarring - I have shell access on my server, but I do not have any of those (ghostscript, xpdf, image-magick) available as of now. I am also not sure if I can run a command line in the middle of a php script and have it wait to execute. This script is a listener, so it runs automatically. I am not an expert programmer, and this is a little over my head. I did find something that was promising about converting a pdf to a jpg with imagick through php, so I will have to see more into how to do that. – David Avellan Jan 14 '16 at 23:00

1 Answers1

1

Ok, so I figured it out.

Imagick was the way to go, and my use of it was good. I just goofed up on the file name because I wasn't using a proper dynamic variable name. Code should have looked like this:

CODE

$imageBlob = base64_decode((string)$pdf->PDFBytes);
${'imagick'.$x} = new Imagick();
${'imagick'.$x}->readImageBlob($imageBlob);
${'imagick'.$x}->setImageFormat('jpeg');
${'imagick'.$x}->setImageCompressionQuality(60);
${'imagick'.$x}->adaptiveResizeImage(1024,768,true);
$imageBlob = ${'imagick'.$x}->getImageBlob();
$PDFdata[] = base64_encode($imageBlob);
$PDFfile[] = $FormCustomField . $x . '.jpg';

So the error I was getting was because of an invalid file name, because the $x variable in the previous code was getting junk values. Now everything works fine.

Community
  • 1
  • 1
David Avellan
  • 385
  • 4
  • 24