48

I upload my lambda function sources from AWS codebuild. My Python script uses NLTK so it needs a lot of data. My .zip package is too big and an RequestEntityTooLargeException occurs. I want to know how to increase the size of the deployment package sent via the UpdateFunctionCode command.

I use AWS CodeBuild to transform the source from a GitHub repository to AWS Lambda. Here is the associated buildspec file:

version: 0.2
phases:
 install:
   commands:
     - echo "install step"
     - apt-get update
     - apt-get install zip -y
     - apt-get install python3-pip -y
     - pip install --upgrade pip
     - pip install --upgrade awscli
     # Define directories
     - export HOME_DIR=`pwd`
     - export NLTK_DATA=$HOME_DIR/nltk_data
 pre_build:
   commands:
     - echo "pre_build step"
     - cd $HOME_DIR
     - virtualenv venv
     - . venv/bin/activate
     # Install modules
     - pip install -U requests
     # NLTK download
     - pip install -U nltk
     - python -m nltk.downloader -d $NLTK_DATA wordnet stopwords punkt
     - pip freeze > requirements.txt
 build:
   commands:
     - echo 'build step'
     - cd $HOME_DIR
     - mv $VIRTUAL_ENV/lib/python3.6/site-packages/* .
     - sudo zip -r9 algo.zip .
     - aws s3 cp --recursive --acl public-read ./ s3://hilightalgo/
     - aws lambda update-function-code --function-name arn:aws:lambda:eu-west-3:671560023774:function:LaunchHilight --zip-file fileb://algo.zip
     - aws lambda update-function-configuration --function-name arn:aws:lambda:eu-west-3:671560023774:function:LaunchHilight --environment 'Variables={NLTK_DATA=/var/task/nltk_data}'
 post_build:
   commands:
     - echo "post_build step"

When I launch the pipeline, I have RequestEntityTooLargeException because there are too many data in my .zip package. See the build logs below:

[Container] 2019/02/11 10:48:35 Running command aws lambda update-function-code --function-name arn:aws:lambda:eu-west-3:671560023774:function:LaunchHilight --zip-file fileb://algo.zip
 An error occurred (RequestEntityTooLargeException) when calling the UpdateFunctionCode operation: Request must be smaller than 69905067 bytes for the UpdateFunctionCode operation
 [Container] 2019/02/11 10:48:37 Command did not exit successfully aws lambda update-function-code --function-name arn:aws:lambda:eu-west-3:671560023774:function:LaunchHilight --zip-file fileb://algo.zip exit status 255
[Container] 2019/02/11 10:48:37 Phase complete: BUILD Success: false
[Container] 2019/02/11 10:48:37 Phase context status code: COMMAND_EXECUTION_ERROR Message: Error while executing command: aws lambda update-function-code --function-name arn:aws:lambda:eu-west-3:671560023774:function:LaunchHilight --zip-file fileb://algo.zip. Reason: exit status 255

Everything works correctly when I reduce the NLTK data to download (I tried with only the packages stopwords and wordnet.

Does anyone have an idea to solve this "size limit problem"?

Louis Singer
  • 767
  • 1
  • 9
  • 18

13 Answers13

54

You cannot increase the deployment package size for Lambda. AWS Lambda limits are described in AWS Lambda developer guide. More information on how those limits work can be seen here. In essence, your unzipped package size has to be less than 250MB (262144000 bytes).

PS: Using layers doesn't solve sizing problem, though helps with management & maybe faster cold start. Package size includes the layers - Lambda layers.

A function can use up to 5 layers at a time. The total unzipped size of the function and all layers can't exceed the unzipped deployment package size limit of 250 MB.

Update Dec 2020 : As per AWS blog, as pointed by user jonnocraig in this answer, you can overcome these restrictions if you build a container for your application & run it on Lambda.

asr9
  • 2,440
  • 1
  • 21
  • 37
  • 5
    So if I try to include pandas which brings in numpy that accounts for 126 MB by itself. Add in the botocore and there is another 48MB. – Samantha Atkins Feb 26 '20 at 05:38
  • 3
    Yes. I am not conversant enough in python, but do check if `botocore` is included in python SDK. If it is, then you dont have to include it & doesn't count in package size. – asr9 Feb 26 '20 at 15:53
  • Deleting pycache files and tests files might help reduce the size https://github.com/aws/sagemaker-python-sdk/issues/1200 – ae0709 Jan 29 '21 at 16:53
  • 3
    pyarrow alone is 200 MB. Numpy 46MB Scipy 113 MB, Pandas 27MB, and that's not even counting aws dependencies. esentially for a python data scientist hello world, you need at least 500 MB. – Got To Figure May 08 '23 at 21:30
  • You can get it down by building these (NumPy and SciPy) from source, I don't know if there's an easier way but see my comments in this thread for key parts of the Dockerfile I used: https://discuss.python.org/t/how-to-use-pip-install-to-build-some-scientific-packages-from-sources-with-custom-build-arguments/24717/15 – Louis Maddox Jun 21 '23 at 21:46
28

If anyone stumbles across this issue post December 2020, there's been a major update from AWS to support Lambda functions as container images (up to 10GB!!). More info here

2540625
  • 11,022
  • 8
  • 52
  • 58
jonnocraig
  • 457
  • 5
  • 5
  • 4
    The reference provided above is for Lambda function size, not the Lambda layers. > The total unzipped size of the function and all layers cannot exceed the unzipped deployment package size quota of 250 MB. See [here](https://docs.aws.amazon.com/lambda/latest/dg/invocation-layers.html#:~:text=The%20total%20unzipped%20size%20of%20the%20function%20and%20all%20layers%20cannot%20exceed%20the%20unzipped%20deployment%20package%20size%20quota%20of%20250%20MB.) – RianLauw Mar 01 '22 at 12:15
14

AWS Lambda functions can mount EFS. You can load libraries or packages that are larger than the 250 MB package deployment size limit of AWS Lambda using EFS.

Detailed steps on how to set it up are here: https://aws.amazon.com/blogs/aws/new-a-shared-file-system-for-your-lambda-functions/

On a high level, the changes include:

  1. Create and setup EFS file system
  2. Use EFS with lambda function
  3. Install the pip dependencies inside EFS access point
  4. Set the PYTHONPATH environment variable to tell where to look for the dependencies
rahul
  • 552
  • 6
  • 11
10

The following are hard limits for Lambda (may change in future):

  • 3 MB for in-console editing
  • 50 MB zipped as package for upload
  • 250 MB when unzipped including layers

A sensible way to get around this is to mount EFS from your Lambda. This can be useful not only for loading libraries, but also for other storage.

Have a look through these blogs:

Jack
  • 16,506
  • 19
  • 100
  • 167
  • Really nice blogs, but for someone less technically inclined like myself, how does this work under the hood. I understand EFS can increase the lambda hard-limit but its not clear to me how it does this. For example, if my lambda is downloading a very large list and it keeps reaching the hard limit, how would EFS help with that. – buydadip Sep 21 '22 at 21:41
7

I have not tried this myself, but the folks at Zappa describe a trick that might help. Quoting from https://blog.zappa.io/posts/slim-handler:

Zappa zips up the large application and sends the project zip file up to S3. Second, Zappa creates a very minimal slim handler that just contains Zappa and its dependencies and sends that to Lambda.

When the slim handler is called on a cold start, it downloads the large project zip from S3 and unzips it in Lambda’s shared /tmp space. All subsequent calls to that warm Lambda share the /tmp space and have access to the project files; so it is possible for the file to only download once if the Lambda stays warm.

This way you should get 500MB in /tmp.

Update:

I have used the following code in the lambdas of a couple of projects, it is based on the method zappa used, but can be used directly.

# Based on the code in https://github.com/Miserlou/Zappa/blob/master/zappa/handler.py
# We need to load the layer from an s3 bucket into tmp, bypassing the normal
# AWS layer mechanism, since it is too large, AWS unzipped lambda function size
# including layers is 250MB.
def load_remote_project_archive(remote_bucket, remote_file, layer_name):
    
    # Puts the project files from S3 in /tmp and adds to path
    project_folder = '/tmp/{0!s}'.format(layer_name)
    if not os.path.isdir(project_folder):
        # The project folder doesn't exist in this cold lambda, get it from S3
        boto_session = boto3.Session()

        # Download zip file from S3
        s3 = boto_session.resource('s3')
        archive_on_s3 = s3.Object(remote_bucket, remote_file).get()

        # unzip from stream
        with io.BytesIO(archive_on_s3["Body"].read()) as zf:

            # rewind the file
            zf.seek(0)

            # Read the file as a zipfile and process the members
            with zipfile.ZipFile(zf, mode='r') as zipf:
                zipf.extractall(project_folder)

    # Add to project path
    sys.path.insert(0, project_folder)

    return True

This can then be called as follows (I pass the bucket with the layer to the lambda function via an env variable):

load_remote_project_archive(os.environ['MY_ADDITIONAL_LAYERS_BUCKET'], 'lambda_my_extra_layer.zip', 'lambda_my_extra_layer')

At the time when I wrote this code, tmp was also capped, I think to 250MB, but the call to zipf.extractall(project_folder) above can be replaced with extracting directly to memory: unzipped_in_memory = {name: zipf.read(name) for name in zipf.namelist()} which I did for some machine learning models, I guess the answer of @rahul is more versatile for this though.

ikkjo
  • 735
  • 1
  • 9
  • 18
6

From the AWS documentation:

If your deployment package is larger than 50 MB, we recommend uploading your function code and dependencies to an Amazon S3 bucket.

You can create a deployment package and upload the .zip file to your Amazon S3 bucket in the AWS Region where you want to create a Lambda function. When you create your Lambda function, specify the S3 bucket name and object key name on the Lambda console, or using the AWS Command Line Interface (AWS CLI).

You can use the AWS CLI to deploy the package, and instead of using the --zip-file argument to pass the deployment package, you can specify the object in the S3 bucket with the --code parameter. Ex:

aws lambda create-function --function-name my_function --code S3Bucket=my_bucket,S3Key=my_file
Julio Oliveira
  • 310
  • 3
  • 10
  • 2
    This helps with the 50MB limit for the deployment .zip file. However, there is still a limit of 250MB for the unzipped content that will be enforced, even when using S3. – mc51 Dec 21 '22 at 11:56
5

This aws wrangler zip file from github (https://github.com/awslabs/aws-data-wrangler/releases) includes many other libraries like pandas and pymysql. In my case it was the only layer I needed since it has so much other stuff. Might work for some people.

all of the included libraries

Victor Ranu
  • 151
  • 2
  • 7
3

You can try the workaround used in the awesome serverless-python-requirements plugin.

Ideal solution is to use lambda layers if it solves the purpose. If the total dependency is greater than 250MB then you can sideload lesser used dependencies from S3 bucket during run time by utilizing the 512 MB provided in /tmp directory. The zipped dependencies are stored in S3 and lambda can fetch the files from S3 during initialisation. Unzip the dependecy pacakge and add the path to sys path.

Please note that the python dependencies need to be built on the Amazon Linux, which is the operating system for lambda containers. I used a EC2 instance to create the zip package.

You check the code used in serverless-python-requirements here

Jedi3112
  • 307
  • 1
  • 3
  • 10
1

Before 2021, the best way was to deploy the jar file to S3, and create AWS lambda with it.

From 2021, AWS Lambda begin to support container image. Read here : https://aws.amazon.com/de/blogs/aws/new-for-aws-lambda-container-image-support/

So from now on, you should probably consider package and deploy your Lambda functions as container images(up to 10 GB).

Yang Liu
  • 541
  • 9
  • 26
0

The tips to use large lambda project into AWS is to use a docker image store in the AWS ECR service instead of a ZIP file. You can use a docker image up to 10GO.

The AWS documentation provide an example to help you here : Create an image from an AWS base image for Lambda

0

The solution for this is to use Lambda Container Images where the limit can extend up to 10GB.

-1

May be late to the party but you can use a Docker Image to get around the lambda layer constraint. This can be done using serverless stack development or just through the console.

Alex w
  • 1
-4

You cannot increase the package size, but you can use AWS Lambda layers to store some application dependencies.

https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html#configuration-layers-path

Before this layers a common used pattern to workaround this limitation was to download huge dependencies from S3.

Mario
  • 502
  • 4
  • 4
  • 7
    This doesn't solve the sizing problem. Package size includes the layers - https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html. `A function can use up to 5 layers at a time. The total unzipped size of the function and all layers can't exceed the unzipped deployment package size limit of 250 MB.` – asr9 Feb 11 '19 at 19:51