3

I'm following this blog post to create a runtime environment using Docker for use with AWS Lambda. I'm creating a layer for using with Python 3.8:

docker run -v "$PWD":/var/task "lambci/lambda:build-python3.8" /bin/sh -c "pip install -r requirements.txt -t python/lib/python3.8/site-packages/; exit"

And then archiving the layer as zip: zip -9 -r mylayer.zip python

All standard so far. The problem arises in the .zip size, which is > 250mb and so creates the following error in Lambda: Failed to create layer version: Unzipped size must be smaller than 262144000 bytes.

Here's my requirements.txt:

s3fs
scrapy
pandas
requests

I'm including s3fs since I get the following error when trying to save a parquet file to an S3 bucket using pandas: [ERROR] ImportError: Install s3fs to access S3. This problem is that including s3fs massively increases the layer size. Without s3fs the layer is < 200mb unzipped.

My most direct question would be: How can I reduce the layer size to < 250mb while still using Docker and keeping s3fs in my requirements.txt? I can't explain the 50mb+ difference, especially since s3fs < 100kb on PyPi.

Finally, for those questioning my use of Lambda with Scrapy: my scraper is trivial, and spinning up an EC2 instance would be overkill.

mmz
  • 1,011
  • 1
  • 8
  • 21
  • 1
    If you have lots of dependencies, container lambdas are the solution. You have 10GB for your dependencies. – Marcin Sep 28 '21 at 02:29
  • appreciate the suggestion @Marcin - will take a look. for the time being, using Python 3.6 rather than 3.8 sufficiently decreased the layer size – mmz Sep 28 '21 at 02:46
  • Sure. Give me few minutes. – Marcin Sep 28 '21 at 02:47
  • @mmz take a look at my answer for some suggestions that might help you further reduce size. – rv.kvetch Sep 28 '21 at 04:41

2 Answers2

4

I can't explain the 50mb+ difference, especially since s3fs < 100kb on PyPi.

That's simple enough to explain. As expected, s3fs has internal dependencies on AWS libraries (botocore in this case). The good news is that boto3 is already included in AWS lambda (see this link for which libraries are available in lambda) therefore you can exclude botocore from your zipped dependencies and save up to ~50MB in total size.

See the above link for more info. The libraries you can safely remove from your zipped artifact file and still be able to run the code on an AWS lambda function running Python 3.8:

  • boto3
  • botocore
  • docutils
  • jmespath
  • pip
  • python-dateutil (generates the dateutil package)
  • s3transfer
  • setuptools
  • six (generates six.py)
  • urllib3 (if needed, bundled dependencies like chardet could also be removed)

You can also use a bash script to recursively get rid of the following (junk) directories that you don't need:

  • __pycache__
  • *.dist-info (example: certifi-2021.5.30.dist-info)
  • tests - Only possibly, but I can't confirm. If you do choose to recursively get rid of all tests folders, first check if anything breaks on lambda, since in rare cases such a package could be imported in code.

Do all this and you should easily save around ~60MB in zipped artifact size.

rv.kvetch
  • 9,940
  • 3
  • 24
  • 53
  • Do you have a reference for those libraries already included in AWS Lambda? – ogdenkev Sep 28 '21 at 04:43
  • Yes, it should be included in the link in the answer. But anyways, here is the reference: https://gist.github.com/gene1wood/4a052f39490fae00e0c3#file-all_aws_lambda_modules_python3-7-txt – rv.kvetch Sep 28 '21 at 04:46
  • For libraries in the Python 3.8 runtime, you can just do a text search for '3.8' and you should find a post that lists the 3.8 specific libraries. – rv.kvetch Sep 28 '21 at 04:46
  • What about binary Wheel packages - don't they "install" files to `*.dist-info`? – Fredrik Wendt May 15 '22 at 13:38
  • @FredrikWendt hmm, I'm not sure actually. do you have an example of a package that installs files to a `*.dist-info` folder? – rv.kvetch Aug 15 '23 at 17:50
3

The key idea behind shrinking your layers is to identify what pip installs and what you can get rid off, usually manually.

In your case, since you are only slightly above the limit, I would get rid off pandas/tests. So before you create your zip layer, you can run the following in the layer's folder (mylayer from your past question):

rm -rvf python/lib/python3.8/site-packages/pandas/tests

This should trim your layer below the 262MB limit after unpacking. In my test it is now 244MB.

Alternatively, you can go over python folder manually, and start removing any other tests, documentations, examples, etc, that are not needed.

Marcin
  • 215,873
  • 14
  • 235
  • 294