I'm following this blog post to create a runtime environment using Docker for use with AWS Lambda. I'm creating a layer for using with Python 3.8:
docker run -v "$PWD":/var/task "lambci/lambda:build-python3.8" /bin/sh -c "pip install -r requirements.txt -t python/lib/python3.8/site-packages/; exit"
And then archiving the layer as zip: zip -9 -r mylayer.zip python
All standard so far. The problem arises in the .zip
size, which is > 250mb and so creates the following error in Lambda: Failed to create layer version: Unzipped size must be smaller than 262144000 bytes
.
Here's my requirements.txt
:
s3fs
scrapy
pandas
requests
I'm including s3fs
since I get the following error when trying to save a parquet file to an S3 bucket using pandas: [ERROR] ImportError: Install s3fs to access S3
. This problem is that including s3fs
massively increases the layer size. Without s3fs
the layer is < 200mb unzipped.
My most direct question would be: How can I reduce the layer size to < 250mb while still using Docker and keeping s3fs
in my requirements.txt
? I can't explain the 50mb+ difference, especially since s3fs
< 100kb on PyPi.
Finally, for those questioning my use of Lambda with Scrapy: my scraper is trivial, and spinning up an EC2 instance would be overkill.