Only create new archive_file for Lambda when code changes?

Question

I am writing a Lambda function in TypeScript. When running terraform apply in github actions I want to build the Lambda (which requires installing the dependencies and then webpacking everything), zip it and then deploy it to AWS. The trigger for the null_resource used to be a timestamp, but this meant that it was doing all these steps (build, zip, deploy) even when nothing changed.

So I changed the trigger to a hash of all the files in the lambda subdirectory

resource "null_resource" "build_lambda_function" {
  triggers = {
    # timestamp = timestamp()
    dir_sha1 = sha1(join("", [for f in fileset(path.root, "${local.lambda_path_prefix}/${var.lambda_dir}/**"): filesha1(f)]))
  }

  provisioner "local-exec" {
    command     = "npm ci && npm run build"
    working_dir = "${local.lambda_path_prefix}/${var.lambda_dir}"
  }
}

This is how I create the zip and deploy it

data "archive_file" "lambda_function_zip" {
  type        = "zip"
  source_dir  = "${local.lambda_path_prefix}/${var.lambda_dir}/dist"
  output_path = "${local.lambda_path_prefix}/${var.lambda_dir}.zip"
  depends_on = [
    null_resource.build_lambda_function
  ]
}

resource "aws_lambda_function" "lambda_function" {
  function_name    = var.lambda_name
  source_code_hash = data.archive_file.lambda_function_zip.output_base64sha256
  filename         = data.archive_file.lambda_function_zip.output_path
  ...
}

This worked great the first time I run it on gh-actions (because the null resource was triggered) but on the second run it failed. Since there was no change the lambda was not being built. No dist folder was created and therefore archive_file threw an error

error archiving directory: could not archive missing directory: ./../lambda/testlambda/dist

Is what I want even possible?

Most CICD systems I've used just rebuilt the lambda artifacts each time. There is a way to cache things in GH Actions, perhaps you could configure it to cache the dist folder if you don't want to rebuild it each time. https://github.com/actions/cache https://github.com/actions/cache — JD D, Sep 10 '22 at 11:42

Leslie Alldridge · Answer 1 · 2022-09-13T20:27:42.127

Archive file will look in your local filesystem and run every time you do a terraform plan. Having a depends_on does nothing, other than saying do the archive after the null_resource. If null_resource isn't triggered, it will still say "okay, I'm done!" and your archive_file will run.

As the comment above has mentioned, most people rebuild every time using a trigger such as timestamp(). As Engineers/Developers we want the following:

Only build and deploy my lambda if code has changed, otherwise I don't want to waste time watching this happen.

To achieve this outcome we must create a persistent data store.

There are a few solutions...

Use Docker images for Lambda. I haven't tested this but technically it should work because you'll just push a new image and your lambda will constantly look for latest. Here's an example I've managed to find. Whether latest is good or not when it comes to image tags...that's another topic. In this case ECR is your persistent data store.
Manually re-create everything archive_file is doing using Bash or similar. Here is a working example.

lambda.tf

data "aws_s3_object" "s3_archive" {
  bucket = "mybucket"
  key    = "lambda/build.zip"
  depends_on = [
    null_resource.build  # wait until our upload script is done
  ]
}

resource "aws_lambda_function" "lambda_function" {
  function_name    = "Pokemon"
  s3_bucket        = "mybucket"
  s3_key           = "lambda/build.zip"
  source_code_hash = data.aws_s3_object.s3_archive.metadata.Sha # this is our custom metadata tag which has the same value as data.archive_file.lambda_function_zip.output_base64sha256 would have
  runtime          = "python3.9"
  handler          = "handler.handler"
  role             = aws_iam_role.role.arn
  depends_on = [
    null_resource.build # don't make lambda until after our upload.sh script
  ]
}

resource "null_resource" "build" {
  triggers = {
    requirements = filesha256("./requirements.txt") # change this file, we run script
    source_code  = filesha256("./handler.py") # change this file, we run script
  }

  provisioner "local-exec" {
    command = "./upload.sh" # run upload.sh script
    interpreter = [
      "bash", "-c"
    ]
  }
}

resource "aws_iam_role" "role" {
  name = "lambda_role"

  assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Effect": "Allow",
      "Sid": ""
    }
  ]
}
EOF
}

handler.py (I know you're using Node, but putting it here anyways)

import requests


def handler(event, context):
    r = requests.get('https://pokeapi.co/api/v2/pokemon/1/').json()
    print(r)
    return "Hello World!"

This is where it gets a bit nasty, Bash on Windows makes my life hard...

upload.sh

mkdir -p ./build # create build directory
cp handler.py ./build # copy handler.py code to build directory
cp requirements.txt ./build # copy requirements.txt code to build directory (this is like your package.json)
pip install --target ./build -r requirements.txt # this is like your npm install command (install dependencies into build directory)
'C:/Program Files/7-Zip/7z.exe' a -r build.zip ./build/* # I'm on windows so cannot use `zip` like Linux/Mac but basically .zip the entire build directory
# On Linux/Mac you can use zip -r -q build.zip ./build/*
SHA=$(sha256sum build.zip | cut -f1 -d \ | xxd -r -p | base64) # Generate a sha256 base64 encoded string (this is what lambda requires based on TF docs)
echo $SHA # Echo for debugging purposes

# Copy .zip to s3 and append metadata `sha` including our sha256 base64 encoded value so we can use it
# to detect if the .zip differs to what our Lambda function has as it's source_code_hash
aws s3 cp ./build.zip s3://mybucket/lambda/build.zip --metadata sha=$SHA

After doing a plan and apply this is my metadata on the build.zip in S3

And this is my Lambda terraform state:

Now when I run another terraform plan:

No changes. Your infrastructure matches the configuration.

When I edit requirements.txt or my python source code (there's S3 changes but I left those out of my screenshot):

Run terraform apply:

Run a follow up terraform plan after deleting ALL build files from my desktop cause why not:

No changes. Your infrastructure matches the configuration.

Obviously this is quite an extreme solution so if you don't mind rebuilding build assets every time then just use the archive_file data source. If it's a deal breaker, use what I've written above - I'm not aware of any other solutions and every GitHub issue I've seen has been "sorry, that's just the way it is" for now.

EDIT: Just to add to my answer, you can avoid using S3 all together and calculate the SHA inside Terraform by following the Lambda docs

  # The filebase64sha256() function is available in Terraform 0.11.12 and later
  # For Terraform 0.11.11 and earlier, use the base64sha256() function and the file() function:
  # source_code_hash = "${base64sha256(file("lambda_function_payload.zip"))}"
  source_code_hash = filebase64sha256("lambda_function_payload.zip")

filebase64sha256 requires the .zip to be present prior to running Terraform though. You'd want to pull this from a persistent store such as Artifactory/S3/Hard Drive though, I believe a fresh .zip won't work.

You'd still do the archiving and zipping using a bash/python script prior to running Terraform but can avoid S3 with this method.

EDIT 2

You may be able to use a Terraform data source as well:

data "external" "build_archive" {
  program = ["python3", "/scripts/build_archive.py"]

  query = {
    directory = "./my_source_code"
    name      = "my_source_code"
  }
}

It looks like .zip archives remember the date and time as metadata so their SHA is ever changing. Perhaps data source is not possible but I'll leave this here to spark further investigation from others.

With my attempt I had the data resource returning a calculated sha but experienced continual drift even using .tar.gz archives with metadata excluded. Lambda couldn't upload .tar.gz so I assume AWS re-calculated the SHA using my .zip archive and ignored my source_code_hash value. If I Sha'd the .tar.gz in isolation it remained consistent though.

Again, this isn't using a persistent data store so that's likely the issue.

Only create new archive_file for Lambda when code changes?

1 Answers1

Linked