43

Does anyone have a fully compiled version of pandas that is compatible with AWS Lambda?

After searching around for a few hours, I cannot seem to find what I'm looking for and the documentation on this subject is non-existent.

I need access to the package in a lambda function however I have been unsuccessful at getting the package to compile properly for usage in a Lambda function.

In lieu of the compilation can anyone provide reproducible steps to create the binaries?

Unfortunately I have not been able to successfully reproduce any of the guides on the subjects as they mostly combine pandas with scipy which I don't need and adds an extra layer of burden.

Moe
  • 766
  • 1
  • 5
  • 16
  • 1
    Check the answer at http://stackoverflow.com/a/43766512/345606 for advice on including Python packages, like Pandas, that have compiled code. – Kevin May 03 '17 at 17:29
  • 1
    Check this blog you can create panda layer for python 3.8 within minutes https://khanakia.medium.com/add-pandas-and-numpy-python-to-aws-lambda-layers-python-3-7-3-8-694db42f6119 – Khanakia Jul 29 '21 at 06:55

15 Answers15

27

I believe you should be able to use the recent pandas version (or likely, the one on your machine). You can create a lambda package with pandas by yourself like this,

  1. First find where the pandas package is installed on your machine i.e. Open a python terminal and type

    import pandas
    pandas.__file__
    

    That should print something like '/usr/local/lib/python3.4/site-packages/pandas/__init__.py'

  2. Now copy the pandas folder from that location (in this case '/usr/local/lib/python3.4/site-packages/pandas) and place it in your repository.
  3. Package your Lambda code with pandas like this:

    zip -r9 my_lambda.zip pandas/
    zip -9 my_lambda.zip my_lambda_function.py
    

You can also deploy your code to S3 and make your Lambda use the code from S3.

aws s3 cp  my_lambda.zip s3://dev-code//projectx/lambda_packages/

Here's the repo that will get you started

Chenna V
  • 10,185
  • 11
  • 77
  • 104
  • On the mac here's the path: /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas – aQ123 Feb 06 '20 at 19:49
  • 1
    @hackwithharsha refer to this answer https://stackoverflow.com/a/57969190/358013 – Chenna V Feb 17 '21 at 00:48
19

After some tinkering around and lot's of googling I was able to make everything work and setup a repo that can just be cloned in the future.

Key takeaways:

  1. All static packages have to be compiled on an ec2 amazon Linux instance
  2. The python code needs to load the libraries in the lib/ folder before executing.

Github repo: https://github.com/moesy/AWS-Lambda-ML-Microservice-Skeleton

9000
  • 39,899
  • 9
  • 66
  • 104
Moe
  • 766
  • 1
  • 5
  • 16
  • @dsvensson please take a second look at the repo it builds the binaries from source. – Moe Jan 19 '17 at 17:43
  • 1
    But this requires a ec2 instance. How to avoid that one? – WJA Jan 10 '19 at 09:34
  • 1
    @JohnAndrews If you've a linux machine, the same steps can be done in your local. The basic requirement is just that Lambda runs on Linux, hence, the compilation of the non Amazon APIs need to be Linux built. – Aakash Basu Mar 25 '19 at 10:37
18

The repo mthenw/awesome-layers lists several publicly available aws lambda layers.

In particular, keithrozario/Klayers has pandas+numpy and is up-to-date as of today with pandas 0.25.

Its ARN is arn:aws:lambda:us-east-1:113088814899:layer:Klayers-python37-pandas:1

Shadi
  • 9,742
  • 4
  • 43
  • 65
  • 1
    This was exactly what I was looking for - a solution where I can simply parse an arn to get access to numpy and pandas! Tested just now and it works like a charm. – mfcss Jun 03 '20 at 13:54
13

I know the question was asked a couple years ago and Lambda was on a different stage back then.

I faced similar issues lately and I thought it would be a good idea to add the newest solution here for future users facing the same problem.

It turns out that amazon released the concept of layers in the re:Invent 2018. It is a great feature. This post in medium describes it much better than I could here: Creating New AWS Lambda Layer For Python Pandas Library

b3rt0
  • 769
  • 2
  • 6
  • 21
  • 3
    Not sure this is an answer to the original question. You still need to create the Lambda Layer via a deployment package which contains the correctly compiled binaries. Lambda Layers just makes those dependencies re-usable across multiple functions. – ashtonium Mar 29 '19 at 12:44
9

The easiest way to get pandas working in a Lambda function is to utilize Lambda Layers and AWS Data Wrangler. A Lambda Layer is a zip archive that contains libraries or dependencies. According to the AWS documentation, using layers keeps your deployment package small, making development easier.

The AWS Data Wrangler is an open source package that extends the power of pandas to AWS services.

Follow the instructions (under AWS Lambda Layer) here.

Bemullen
  • 91
  • 1
  • 1
5

Another option is to download the pre-compiled wheel files as discussed on this post: https://aws.amazon.com/premiumsupport/knowledge-center/lambda-python-package-compatible/

Essentially, you need to go to the project page on https://pypi.org and download the files named like the following:

  • For Python 2.7: module-name-version-cp27-cp27mu-manylinux1_x86_64.whl
  • For Python 3.6: module-name-version-cp36-cp36m-manylinux1_x86_64.whl

Then unzip the .whl files to your project directory and re-zip the contents together with your lambda code.

NOTE: The main Python function file(s) must be in the root folder of the resulting deployment package .zip file. Other Python modules and dependencies can be in sub-folders. Something like:

my_lambda_deployment_package.zip
├───lambda_function.py
├───numpy
│   ├───[subfolders...]
├───pandas
│   ├───[subfolders...]
└───[additional package folders...]
ashtonium
  • 2,081
  • 1
  • 17
  • 20
  • 1
    This is not working. What I did was, got pandas-0.24.2 and its dependencies (numpy-1.16.2, python-dateutil-2.8.0, pytz-2018.9, six-1.12.0), all cp36-cp36m-manylinux1_x86_64.whl from https://pypi.org/ and unzipped and put in a single windows folder. Put the Python code, zipped it and uploaded. Getting error: Unable to import module 'lambda_function': No module named 'lambda_function' – Aakash Basu Mar 25 '19 at 10:35
  • Sounds like it's expecting the default python file name. Is your `lambda_function.py` file in the root level of your .zip file along with the various package folders? – ashtonium Mar 28 '19 at 21:35
3

@ashtonium's answer actually works and is most likely the easiest, however, a few additional steps are required. Also, Pandas requires Pytz (mentioned in the link provided by @b3rt0) so that package is needed as well.

  1. Download the whl-files from PyPI (the Pandas file ends with ...manylinux1_x86_64.whl, there is only one Pytz file of relevance)
  2. Unzip the whl-files using terminal command, e.g. unzip filename.whl (Linux/MacOS)
  3. Create a new folder structure python/lib/python3.7/site-packages/ (swap 3.7 for version of your choice)
  4. Move folders from step 2 to site-packages folder in step 3
  5. Zip root folder in new structure, i.e. python
  6. Create a new layer in AWS management console where you upload the zip-file

This is a very common question, I hope my solution helps.

Update on Aug 19, 2020: Wheel-files aren't available for all packages. In these cases you can skip to step 3, go into the site-packages folder and install the package in there with pip3 install PACKAGE_NAME -t . (no venv required). Some packages are easier than others, some are trickier. Psycopg2 for example, requires you to move only one of the two (as of this writing) package folders.

/Cheers

user3661992
  • 451
  • 5
  • 5
  • 1
    This was perfect - thanks! I've been fighting this for a couple of days now and it was getting frustrating to say the least. I've read articles that pretty much said the same thing as this but with a lot of fluff around them that it made hit hard to follow. This was the first concise explanation that I have found and I was able to get my layer up and running in a couple of minutes. – B. Youngman Jul 13 '20 at 20:09
  • No worries @B.Youngman! I just used the solution yesterday. – user3661992 Aug 19 '20 at 05:37
2

I managed to deploy a pandas code in aws lambda using python3.6 runtime . this is the step that i follow :

  1. Add required libraries into requirements.txt
  2. Build project in a docker container (using aws sam cli : sam build --use-container)
  3. Run code (sam local invoke --event test.json)

this is a helper : https://github.com/ysfmag/aws-lambda-py-pandas-template

nofinator
  • 2,906
  • 21
  • 25
ymaghzaz
  • 84
  • 10
2

There are some precompiled packages on github by ryfeus.

Rajesh
  • 7,766
  • 5
  • 22
  • 35
2

My solution has been to maintain 2 requirements.txt style files of packages that go in my layer, one named provided_packages.txt and one named provided_linux_installs.txt

Before deployment (if the packages are not already installed) I run:

pip install -r provided_packages.txt -t layer_name/python/lib/python3.8/site-packages/.
pip download -r provided_linux_installs.txt --platform manylinux1_x86_64 --no-deps -d layer_name/python/lib/python3.8/site-packages

cd layer_name/python/lib/python3.8/site-packages 
unzip \*.whl
rm *.whl

Then deploy normally (I am using cdk synth & cdk deploy \* --profile profile_name)

In case helpful, my provided_linux_installs.txt looks like this:

pandas==1.1.0
numpy==1.19.1
pytz==2020.1
python-dateutil==2.8.1
Scott Brenstuhl
  • 627
  • 6
  • 7
  • Interesting solution, why do you use `--no-deps` for your second `pip` call? I assume this would not be correct if you were using something that had another dependency? – Jimbo Jan 21 '21 at 03:41
  • 1
    just ran the code without and got this nice message .... "When restricting platform and interpreter constraints using --python-version, --platform, --abi, or --implementation, either --no-deps must be set, or --only-binary=:all: must be set and --no-binary must not be set (or must be set to :none:)." – Jimbo Jan 21 '21 at 05:04
  • 2
    @Jimbo that sounds right. To be honest I don't completely remember but I _think_ that I had to do a round or two of getting error messages to make sure I had all of the dependencies in one of the requirements files (with a notable one being `pytz` which also needs to be loaded as the linux version) – Scott Brenstuhl Jan 22 '21 at 06:03
2

I have started to maintain a GitHub repo for easy and quick access to layers. https://github.com/kuharan/Lambda-Layers

I have been using these for my open-source projects and stuff.

1
# all the step are done in AWS EC2 Linux Free tier so that all the Libraries  are compatible with the Lambda environment

# install the required packages
mkdir packages
pip3 install -t . pandas
pip3 install -t . numpy --upgrade
pip3 install -t . wikipedia --upgrade
pip3 install -t . sklearn --upgrade
pip3 install -t . pickle-mixin --upgrade
pip3 install -t . fuzzywuzzy --upgrade


# Now remove all unnecessary files
sudo rm -r *.whl *.dist-info __pycache__

# Now make a DIR so that lambda function can reconginzes
sudo mkdir -p build/python/lib/python3.6/site-packages


# Now move all the files from packages folder to site-packages folder 
sudo mv /home/ec2-user/packages/*  build/python/lib/python3.6/site-packages/

# Now move to the build packages
cd build

# Now zip all the files starting from python folder to site-packages
sudo zip -r python.zip .

upload the zip file to lambda layers

Suraj Rao
  • 29,388
  • 11
  • 94
  • 103
1

python 3.8 windows 10 lambda aws pandas

You need to do the following steps on a linux machine and python 3.8:

  1. sudo mkdir python
  2. sudo pip3 install --target python pandas
  3. sudo zip -r pandas.zip python
  4. create a public s3 bucket, upload pandas.zip, grab the public URL.
  5. create new lambda layer using s3 URL from above.
  6. add layer to lambda function and import pandas as pd like you normally would

No linux machine? Launch an Ubuntu EC2 instance or container:

  1. sudo apt install python3.8 zip unzip python3-pip
  2. run 1-3 above
  3. Now you need to copy the zip to your local machine. Open a command terminal and change directory to the folder containing your EC2 instance's pem file and run: scp -i yourPemFile.pem ubuntu@'EC2.Instance.IP.Here':/home/path/to/pandas.zip C:\Users\YourUser\Desktop
  4. run steps 4-6 from above

*for number 3 above: you need to grab your EC2 IP and insert it. You may get an error about the permissions on the pem file, if you do then right click the pem file > properties > security > advanced > disable inheritance and make sure only your user is in the "permission entries." Lastly, fix the paths to point to where the pandas.zip file is on the EC2 instance and where you want the file to end up locally.

**pay attention to the python runtime of the lambda function. Make sure it matches the version of python you're using to do the pip stuff (which should be 3.8).

***the original folder name "python" is named that for a reason as per AWS documentation.

grantr
  • 878
  • 8
  • 16
0

After lots of googling on this and messing around, the concept of layers are great and seem to work for me.

This github repo from keithrozario has loads of pre-build layers you can simply add to your lambda via the arn which has some great stuff in there like pandas, requests and sqlalchemy.

I've create a template to compile and upload a layer (containing python dependencies) to lambda using the AWS CLI which you can find in my Gitlab repo here.

I'm running this on an Amazon Linux EC2, using a virtual environment (venv) to install libraries from a requirements.txt file and then load the zipped files to lambda using the AWS CLI.

Note the folder structure my_zip_file/python/binaries which is required for lambda.

Note: Pandas is quite a large library. Your zipped layer file must be below 70mb.

You may also encounter the horrible "OpenBLAS WARNING - could not determine the L2 cache size on this system" error message. I had to increase the memory from the default 128mb in order to the lambda to successfully run.

Dharman
  • 30,962
  • 25
  • 85
  • 135
0

After searching around for a few hours, I cannot seem to find what I’m looking for and the documentation on this subject is non-existent.

So i decided to build the libraries myself to support the Amazon Linux 2 arch.

Read full blog here https://khanakia.medium.com/add-pandas-and-numpy-python-to-aws-lambda-layers-python-3-7-3-8-694db42f6119

Khanakia
  • 723
  • 1
  • 8
  • 20