2

Our AWS Lambda function takes a payload of images that it needs to download to /tmp from AWS Cloudfront. Then it performs some imagemagick simply composites on those images. 99% of the time it works flawlessly. In the past couple weeks (no changes made) we are now seeing 4 or 5 504 request timed out errors every 60 minutes.

In our serverless (see screenshot with spans) the function invocation fails, because it timed out, but all of the cloudfront GET are 200. By 268ms all the images the function needs are downloaded, and it appears the function is moving on. However, the api has a max 29 seconds so the function appears to move on, though the api gateway will return 504? How do I overcome this if the function is running just fine?

We enabled x-ray tracing (see screenshot) and there are zero details of any type of function invocation errors? As you can see in the screenshot there the function is in pending status. Not sure where to go from here to debug. Getting 504 errors are annoying, but I really want to know that the function is properly running.

I thought xray would give me some actual traces though that I could see (maybe I don't have configured correctly), but there is nothing in xray at this point about where things are timing out in the function invocation steps.

Here is our simpler handler. I'm not including the other functions, which could potentially timeout I suppose, but they should be very fast. The image will build in just a second, an upload to S3 occurs maybe a few seconds at most, then invalidation occurs, then invalidation, which might be a couple more seconds, if that? It shouldn't be hitting 30 seconds and nothing throwing an exception.

const { getBucketObjects, buildRackImage, uploadRackImageToS3, invalidateCdn } = require('./utils')

const HTTP_OK = 200

module.exports.buildRack = async event => {
    try {
        let rack

        try {
            rack = event.rack ? event : JSON.parse(event.body)
        } catch (e) {
            return Promise.reject(new Error(e))
        }

        if (!rack.image_name) {
            // eslint-disable-next-line no-throw-literal
            throw 'Image name was not provided.'
        }

        // basic data validation that all images are either jpg or png
        const errorMsg = []

        if (!rack.images) {
            // eslint-disable-next-line no-throw-literal
            throw 'Images array is empty.'
        }

        for (let i = 0; i < rack.images.length; i++) {
            const typeMatch = rack.images[i].image.match(/\.([^.]*)$/) // Infer the image type.

            if (!typeMatch) {
                errorMsg.push(`Could not determine the image type: ${rack.images[i].image}`)
            }

            const imageType = typeMatch[1]

            if (imageType !== 'jpg' && imageType !== 'png') {
                errorMsg.push(`Unsupported image type: ${rack.images[i].image}`)
            }
        }

        if (errorMsg.length > 0) {
            errorMsg.push(JSON.stringify(rack.images))
            // eslint-disable-next-line no-throw-literal
            throw errorMsg.join(' ')
        }

        /**
         * Download the rack images from S3, build the rack,
         * and upload to an S3 bucket
         */
        const getObjectResponse = await getBucketObjects(rack)

        if (!getObjectResponse) {
            // eslint-disable-next-line no-throw-literal
            throw getObjectResponse
        }

        /**
         * Build the rack image locally using imagemagick
         */
        const buildRackImageResponse = await buildRackImage(rack)

        if (!buildRackImageResponse) {
            // eslint-disable-next-line no-throw-literal
            throw buildRackImageResponse
        }

        /**
         * Upload the rack image to S3
         */
        const uploadRackImageResponse = await uploadRackImageToS3(rack.image_name)

        if (!uploadRackImageResponse) {
            // eslint-disable-next-line no-throw-literal
            throw uploadRackImageResponse
        }

        /**
         * Invalidate the rack image name from CDN if it exists
         */
        const invalidateCdnResponse = await invalidateCdn(rack.image_name)

        if (!invalidateCdnResponse) {
            // eslint-disable-next-line no-throw-literal
            throw invalidateCdnResponse
        }

        return {
            statusCode: HTTP_OK,
            body: JSON.stringify({
                message: 'Rack Successfully Built!',
                statusCode: HTTP_OK,
            }),
            isBase64Encoded: false,
        }
    } catch (e) {
        // eslint-disable-next-line no-console
        console.log(JSON.stringify(e))

        return Promise.reject(new Error(e))
    }
}

enter image description here

w

Zelf
  • 1,723
  • 2
  • 23
  • 40

1 Answers1

0

A typical case for this is a Lambda function that runs inside a VPC, is trying to access public internet, but can't for one of the reasons below:

  1. Lambda is associated with private subnets but not all of them are linked to a NAT Gateway (misconfiguration in the route tables).
  2. Lambda is associated with public subnets but not all of them have an Elastic Public IP associated to the underlying ENI.

It could be that your Lambda was linked to multiple subnets (maybe all subnets), but not all subnets could provide public internet access, thus causing random 504 errors.

Solutions

  1. When a Lambda/EC2 communicates with S3 and some other AWS Services, by default it does so via public endpoints, thus requiring internet access. This can be fixed by using VPC endpoints, then no public internet is required (communication is done within AWS network).

The solutions below are if you need public internet access for other resources.

  1. If you have private subnets with NAT gateway, it's best to just remove the public subnets from your Lambda as explained here.

  2. If you are using public subnets (like the default VPC in which all subnets are public by default) and you want to keep it that way, then attach a Public IP to the ENIs for each subnet x security group combination associated with the Lambda as explained here.

  3. Another solution is to just keep the Lambda outside of VPCs (then it has public internet access by default), we only attach Lambda to VPC when we need to access resources in the VPC (e.g. a database).

Alisson Reinaldo Silva
  • 10,009
  • 5
  • 65
  • 83