13

I've had a AWS Lambda function running on S3 objects for the last 18 months and it died around a month ago after a minor update. I've reverted it but it's still broken. I've looked into doing the most basic conversion of pdf using ImageMagick with no luck so I think AWS has updated something and caused the pdf module to either be removed or stop working.

I've done just the basic function I was basically doing in my core code in Node.js 8.10:

gm(response.Body).setFormat("png").stream((err, stdout,stderr) => {
  if (err) {
    console.log('broken');
  }
  const chunks = [];
  stdout.on('data', (chunk) => {
    chunks.push(chunk);
  });
  stdout.on('end', () => {
    console.log('gm done!');
  });
  stderr.on('data', (data) => {
    console.log('std error data ' + data);
  })
});

with the error response:

std error dataconvert: unable to load module `/usr/lib64/ImageMagick-6.7.8/modules-Q16/coders/pdf.la': file not found

I've also tried moving to Node.js 10.x and using the ImageMagick layer that's available through the aws serverless app repository. Trying this on the same code generates this error

std error data convert: FailedToExecuteCommand `'gs' -sstdout=%stderr -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 '-sDEVICE=pngalpha' -dTextAlphaBits=4 -dGraphicsAlphaBits=4 '-r72x72' '-sOutputFile=/tmp/magick-22TOeBgB4WrfoN%d' '-f/tmp/magick-22KvuEBeuJuyq3' '-f/tmp/magick-22dj24vSktMXsj'' (1) @ error/pdf.c/InvokePDFDelegate/292

In both cases the function works correctly when running on an image file instead.

Based on this I think both the aws 8.10 ImageMagick and the layer for 10 are missing the pdf module but I'm unsure how to add it or why it was removed in the first place. Whats the best way to fix this function that was working?

EDIT

So I've downloaded https://github.com/serverlesspub/imagemagick-aws-lambda-2 and built the library manually, uploaded it to Lambda and got it successfully working as a layer however it doesn't include GhostScript of which it is an optional library. I've tried to add it to Makefile_ImageMagick which builds and has some references to Ghostscript in the result but running it doesn't fix the PDF issue (images still work). Whats the best way to add the GhostScript optional library to the Make file?

Rudiger
  • 6,749
  • 13
  • 51
  • 102
  • 1
    Do you have Ghostscript installed for Imagemagick? If you don't, then you need it. If you do, then you might need to edit the delegates.xml file to insert the full path to gs (ghostscript) for PDF related entries. Or you may need to edit your policy.xml file to give read and write permissions for PDF files. See https://stackoverflow.com/questions/52861946/imagemagick-not-authorized-to-convert-pdf-to-an-image/52863413#52863413. Sorry, I do not know or use AWS – fmw42 Jul 17 '19 at 02:44
  • 1
    @fmw42 I don't but I didn't need it before. Did AWS remove it? – Rudiger Jul 17 '19 at 03:18
  • 1
    What do you mean, you did not need it before? On AWS or just Imagemagick. Ghostscript is always needed to read PDF files. It is not needed to write them. If your Imagemagick version was old before, it the policy.xml file may not have included that. It was added when there was a security bug reported about Ghostscript a few months ago. Perhaps AWS updated Imagemagick and that introduced the new entry in the policy.xml file. Or perhaps they left out Ghostscript. – fmw42 Jul 17 '19 at 04:28
  • 1
    @fmw42 I mean this function has been working on AWS Lambda for the last 18 months (there is a lot more to the function but the thing thats broken is linked above). It died about 3 weeks ago. There could have been a change before that and me redeploying the function caused new libraries to be loaded, maybe. – Rudiger Jul 17 '19 at 06:13
  • 1
    Seems that is the case, the lib is not included anymore. If it so, then Lambda developers need some kind of assurance on what lib shall be available and what lib are 'blacklisted'. Do you able to test on VPC whether PDF is generated or not? – Donnie Jul 17 '19 at 10:38
  • 3
    Same issue here! Looks like a AWS Issue because it's been working for more than year... – Sergio Costa Jul 17 '19 at 21:00
  • 5
    I believe AWS Lambda no longer includes ghostscript by default, which is the PDF delegate used by ImageMagick. – emcconville Jul 18 '19 at 13:46

4 Answers4

17

While the other answers helped there was still a lot of work to get to a workable solution so below is how I managed to fix this, specifically for NodeJS.

Download: https://github.com/sina-masnadi/lambda-ghostscript

zip up the bin directory and upload it as a layer into Lambda.

Add https://github.com/sina-masnadi/node-gs to your NodeJS modules. You can either upload them as part of your project or the way I did it as a layer (along with all your other required ones).

Add https://github.com/serverlesspub/imagemagick-aws-lambda-2 as a layer. Best way to do this is to create a new function in Lambda, Select Browse serverless app repository, search for "ImageMagick" and select "image-magick-lambda-layer" (You can also build it and upload it as a layer too).

Add the three layers to your function, I've done it in this order

  1. GhostScript
  2. ImageMagick
  3. NodeJS modules

Add the appPath to the require statement for ImageMagick and GhostScript:

var gm = require("gm").subClass({imageMagick: true, appPath: '/opt/bin/'});
var gs = require('gs');

Mine was in an async waterfall so before my previous processing function I added this function to convert to a png if wasn't an image already:

  function convertIfPdf(response, next) {
    if (fileType == "pdf") {
      fs.writeFile("/tmp/temp.pdf", response.Body, function(err) {
        if (!err) {
          gs().batch().nopause().executablePath('/opt/bin/./gs').device('png16m').input("/tmp/temp.pdf").output('/tmp/temp.png').exec(function (err, stdout, stderr){
            if (!err && !stderr) {
              var data = fs.readFileSync('/tmp/temp.png');
              next(null, data);
            } else {
              console.log(err);
              console.log(stderr);
            }
          });
        }
      });
    } else {
      next(null, response.Body);
    }
  }

From then on you can do what you were previously doing in ImageMagick as it's in the same format. There may be better ways to do the pdf conversion but I was having issues with the GS library unless working with files. If there are better ways let me know.

If you are having issues loading the libraries make sure the path is correct, it is dependent on how you zipped it up.

Rudiger
  • 6,749
  • 13
  • 51
  • 102
  • 2
    @SachinChavan That is the exact same outcome except this answer has clearer steps and uses layers for good reuse. It even says it does the same thing in the blog post. – Rudiger Aug 18 '19 at 01:27
  • 2
    you are correct, I was trying this but couldn't able to follow layer things, which is really good. I have added those binaries in zip and uploaded it worked. will give it a try. Thanks for this it's really useful. – Sachin Chavan Aug 19 '19 at 05:19
  • If lets say I want to take the images and upload to S3, how do we do that with GS?@Rudiger, if you output to /tmp/temp.png, wont that write to a local filesystem of lambda? and where would that be saved? Because lambda is stateless yes (lambda is unlike EC2 where EC2 has a filesystem)? – KJ Ang Sep 20 '21 at 04:52
  • @KJAng yes, you'll need to write code that picks it up and moves it but thats beyond the scope of this answer. There are a number of tutorials that will show you the code to do this, this answer is specific to the GS part of it. – Rudiger Sep 20 '21 at 05:02
  • @Rudiger, Thanks for answering. Just one more question. Why did you do `var data = fs.readFileSync('/tmp/temp.png');`? Is it because you are going to use data to sent to S3? Is data a buffer or base64 or smth else? – KJ Ang Sep 20 '21 at 05:58
  • This is really old so not sure but from memory I download file from s3, manipulate it, then send it back to s3. I think I just put that line there to imply you get your image from somewhere. – Rudiger Sep 20 '21 at 07:12
  • @Rudiger, I see what you did. You (1)download from s3, (2)write Buffer to local temp `/tmp/temp.pdf`, (3) ask GS to read from local temp `/tmp/temp.pdf`, (4) Tell GS to process and save its output to `/tmp/temp-%d.png` (5) Then here you can read each image `/tmp/temp-%d.png` as Buffer and upload one by one! I tried on lambda and it worked like a charm. Thank you @Rudiger. – KJ Ang Sep 20 '21 at 08:54
  • One additional comment, there is NO NEED for imageMagick or graphicsmagick (GM), in my opinion. If all you need is to convert multi page pdf into images, Image Magick is not needed. So you can skip adding Image Magick layer. Ghostscript is sufficient. In fact, using GS alone is much faster (like 2x - 4x faster on my machine) than using Image Magick or GM as IM and GM use GS under the hood. – KJ Ang Sep 20 '21 at 08:59
4

I had the same problem. Two cloud services processing thousands of PDF pages a day failing because of the pdf.la not found error.

The solution was to switch from Image Magick to GhostScript to convert PDFs to PNGs and then use ImageMagick with PNGs (if needed). This way, IM never has to deal with PDFs and wont need the pdf.la file.

To use GhostScript on AWS Lambda just upload the gs binary in the function zip file.

  • That sucks. I'll give it a go and update once I've looked into it. Did you just build https://github.com/sina-masnadi/lambda-ghostscript? – Rudiger Jul 25 '19 at 03:58
  • @Rudiger I installed ghostscript on an Amazon Linux EC2 instance, used "whereis" to find the "gs" binary and "lld gs" to get a list of needed libs. Added the gs binary to the root of the function zip file and the libs to a "lib" folder inside the function zip file. – José Augusto Paiva Jul 25 '19 at 08:46
  • @Rudiger Also, changed the lambda function code so that it would find the gs binary in /var/task/gs. – José Augusto Paiva Jul 25 '19 at 08:53
  • 1
    Thanks, I've added ImageMagick as a layer which works well so I'll be adding gs as one too. – Rudiger Jul 25 '19 at 09:39
  • 1
    Though this wasn't the answer I was after it was the most helpful towards the answer I got to so I'll award the bounty to it. – Rudiger Jul 28 '19 at 22:29
1

You can add a Layer to your lambda function to make it work again until the 22/07/2019. The ARN of the Layer that you need to add is the following : arn:aws:lambda:::awslayer:AmazonLinux1703

The procedure is described at upcoming-updates-to-the-aws-lambda-execution-environment

Any long term solution would be wonderful.

Nicolas Oste
  • 142
  • 2
  • 8
  • 1
    Thanks for this. It has definitely fixed the issue but only for 2 days which isn't great. Probably can't mark it as correct due to this but definitely a temporary fix for now. – Rudiger Jul 20 '19 at 03:04
1

I had the issue where ghostscript was no longer found.

Previously, I had referenced ghostscript via:

var gs = '/usr/bin/gs';

Since AWS lambda stopped providing that package, I went and included it directly into my lambda function which worked for me. I just downloaded the files from https://github.com/sina-masnadi/lambda-ghostscript and placed it in a folder called 'ghostscript' Then referenced it as so:

var path = require('path')
var gs = path.join(__dirname,"ghostscript","bin","gs")