20

START RequestId: 3d5691d9-ad79-4eed-a26c-5bc3f1a23a99 Version: $LATEST Unable to import module 'lambda_function': No module named 'pandas'
END RequestId: 3d5691d9-ad79-4eed-a26c-5bc3f1a23a99

I'm using Windows 7 64-bit as the host OS.

What I want to do

I simply want to use pandas in AWS-Lambda environment. Just like I use it in windows environment, I am looking for a simple solution for Lambda.

What I have tried so far

  • Installed Xubuntu on a virtual box.
  • Create a virtual environment called myvenv in Xubuntu on virtual-box.
  • Then I installed pandas3.6 in myvenv.
  • Thereafter, I copied the folder contents in myvenv at location '/usr/local/lib/python3.6/site-packages/' to my host OS.
  • In the host OS (windows 7), I created a folder called packs, pasted the contents of myvenv.
  • created a lambda_function.py script in packs in host OS (windows 7)
  • I then zipped the folder packs using 7zip software and upload it as zip in Lambda
  • In Lambda, the lambda function handler name is, lambda_handler(). The code snippet looks like,

import pandas as pd

def lambda_handler(event, context):

    dates = pd.date_range('2019001', periods=6)

    df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
    print(df)
  • The handler is named as lambda_function.lambda_handler. I have given the lambda-role AWSLambdaFullAccess permission.
  • The time out is set to 4 min and 3 sec.
  • The test event looks like

    { "key1": "This will be printed if all OK" }

I have tried the following solutions:

  • Tried precompiled linux-compatible binaries for pandas & numpy from here -- no luck.
  • In Lambda, changed the Handler info to python_filename.function_name. For my case, it was lambda_function.lambda_handler -- failed with no module named 'pandas' error.
  • placed the lambda function in the root folder, zipped the folder using 7zip software and upload the folder to the S3 bucket. For my case, I placed the function at location python\lib\python3.6\site_packages\lambda_function.pyfailed with no module named 'pandas' error.
  • Already tried these related solutions posted on SO, 1, 2, 3, 4, 5, 6

Note: I do not want to use Docker, because I do not know how to use it and I'm not willing to learn it as I'm exasperated now. I'm coming from a windows environment (it sucks, I now know.)

Any ideas on how to get this to work.

mnm
  • 1,962
  • 4
  • 19
  • 46
  • been there, and I really don't recommend zipping your dependencies in windows (permissions and all will be your concerns). I haven't tried installing pandas inside a lambda but I do have experience trying to install other libraries (i.e. `psycopg2`). Though I don't do through some of the difficult steps you've described, what I usually do is just (1) Create a folder; (2) Add python files in created folder; (3) Install dependencies directly on that folder (i.e. `pip install -t lib1 lib2); (4) Zip all the contents (`zip -r lambda.zip .`); (5) Upload zip file to lambda; – fixatd Aug 28 '19 at 09:25
  • @fixatd thank you for the response. The solution suggested is something that I have already tried with no luck. – mnm Aug 28 '19 at 09:32
  • Ah, must have been one of the solutions you've outlined. Probably missed that one. Can't say for certain why yours fail though as I've not tried with `pandas` – fixatd Aug 28 '19 at 09:34
  • from cli, go to the folder location where the lambda_function.py located and `pip install -t . pandas` and then zip the folder, upload it. – Lamanus Aug 28 '19 at 15:13
  • @Lamanus thank you for the response. The solution suggested is something that I have already tried with no luck – mnm Aug 30 '19 at 01:27
  • Are you by any chance using 32-bit Windows? – Noel Llevares Sep 06 '19 at 19:24
  • @dashmug I'm using windows 64-bit. Updated same in the question. – mnm Sep 07 '19 at 00:16
  • would creating a Lambda layer from a Cloud 9 instance be an applicable solution for the use case? – jmp Sep 07 '19 at 04:57
  • @jmp I think that would be too easy a solution! Don't you think? If I've to use `Cloud9` with lambda layer then why not just use `AWS Sagemaker`? In Sagemaker, I don't even have to use a lambda layer to use pandas.. Nope that will not be an applicable solution. Because, be it `Cloud9` or `Sagemaker` they are both far more expensive (in terms of money) than just using a lambda layer. – mnm Sep 07 '19 at 05:33
  • Is the Xubuntu guest OS also 64-bit? – Noel Llevares Sep 07 '19 at 10:22
  • Well the cloud 9 instance would just be used for creation of the layer you can turn it off later. Alternatively, a small t2 ec2 instance would work too. Cloud 9 is just bit easier for me since it has a nice ide and terminal – jmp Sep 07 '19 at 10:23
  • We'll need some Amazon Linux instance to get the right binaries for lambda since is also uses Amazon Linux. I don't think there's a way without using docker or Sam to download the right binaries on Windows – jmp Sep 07 '19 at 10:27
  • @jmp I have the right binaries. Hope you read the question. Because I have stated all steps undertaken. I strongly suspect the issue is with the broken path, that is why pandas module cannot be found. – mnm Sep 07 '19 at 11:31
  • Hmm sorry about that, the Lambda documentation suggests that the library will need to be at the same level in the file tree as the actual file executed. https://docs.aws.amazon.com/lambda/latest/dg/lambda-python-how-to-create-deployment-package.html#python-package-venv `cd v-env/lib/python3.7/site-packages` `zip -r9 ${OLDPWD}/function.zip .` `cd $OLDPWD` `zip -g function.zip function.py` If you want to import it currently I think you can do this using `sys.path.insert(1, '/path/to/application/app/folder')`(not tested) as shown here https://stackoverflow.com/a/4383597/112233 – jmp Sep 07 '19 at 15:14
  • I took the zip upload to s3 here and moved everything to the root of the zip and now I get `Original error was: No module named 'numpy.core._multiarray_umath'` which suggests that this is the right answer since the error is different now. i'll see if I can get it to work with the code provided here – jmp Sep 07 '19 at 15:28
  • @jmp great that you've found a breakthrough.. if you can get it to work, then post your solution as an answer (but I'll have to verify it that it can work for me). Please ensure to clearly outline every step. Note, that I'm using a Windows 7 64-bit machine, so I will really appreciate it if your proposed answer is descriptive or self-explanatory. – mnm Sep 07 '19 at 16:01

4 Answers4

20

I was able to import the pandas library successfully using a Lambda layer and an Amazon linux Cloud 9 instance. There are the commands I executed in the Cloud 9 instance and the Lambda function's output. I had to change the code slightly since it was failing with an import error and string value error.

Alternatively, these commands can also be executed in an EC2 instance. If it's not possible to use the SAM CLI(which uses docker) or just plain docker on windows we'll need to use an Amazon Linux instance to build everything since that's what AWS Lambda uses currently. I don't believe using an ubuntu instance will work here.

Commands:

python --version
Python 3.6.8

# https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html
# python 3.6 uses Amazon Linux currently 

mkdir project
cd project
virtualenv v-env
source ./v-env/bin/activate
pip install pandas
deactivate

# creating layer
# https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html#configuration-layers-path
mkdir python
cd python
cp -r ../v-env/lib64/python3.6/dist-packages/* .
cd ..
zip -r panda_layer.zip python
aws lambda publish-layer-version --layer-name pandas --zip-file fileb://panda_layer.zip --compatible-runtimes python3.6 

The publish-later-version command will create a new AWS Lambda layer in the region given in the command or the config file for the CLI.

A Lambda layer will apply the library to the Lambda function's code without needing to apply it directly to the deployment package. This also allows the use of the online code editor in Lambda since the deployment package is under the limit of 3MB

I applied the Lambda layer by clicking on the Layer button in the web console and choosing the layer version that I most recently published. I have a second version there because the first time I attempted this is put the contents of the lib directory which isn't for a 64 bit OS and my code failed in AWS Lambda.

lambda web console

Alternatively, you can also apply the layer using the CLI command update-function-configuration

Lambda function code I used:

import pandas as pd
import numpy as np

def lambda_handler(event, context):
    dates = pd.date_range(start='1/1/2018', end='1/08/2018')
    df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=list('ABCD'))
    print(str(df))


Lambda output:

START RequestId: 27f09b6c-a4cd-49dd-bb3e-ae6fc7cd5850 Version: $LATEST
                   A         B         C         D
2018-01-01 -1.040318  0.450841 -0.381687 -0.105480
2018-01-02 -1.381793 -0.481572  0.828419 -0.885205
2018-01-03  1.437799 -0.649816 -0.577112  0.400670
2018-01-04 -0.730997 -0.778775 -1.514203  1.165661
2018-01-05  1.963595 -1.137054  0.920218  0.960210
2018-01-06 -0.429179 -0.745549  1.482562  0.298623
2018-01-07 -1.082388 -0.529476 -1.051663  1.616683
2018-01-08  0.042779 -2.338471 -0.142992  0.680399
END RequestId: 27f09b6c-a4cd-49dd-bb3e-ae6fc7cd5850
REPORT RequestId: 27f09b6c-a4cd-49dd-bb3e-ae6fc7cd5850  Duration: 536.76 ms Billed Duration: 600 ms Memory Size: 128 MB Max Memory Used: 122 MB Init Duration: 1721.51 ms   
XRAY TraceId: 1-5d741e40-1311daa29fc16c74735988fc   SegmentId: 61a595dd3492c331 Sampled: false  
jmp
  • 2,175
  • 2
  • 17
  • 16
  • I updated my answer with some more info about that cli command. In my case, the version of pandas I downloaded was for python3 and was successful in a python3.6 AWS Lambda function. Also elaborated more on my earlier comment. – jmp Sep 08 '19 at 14:57
  • thanks for the idea. After several failed experiments, with lambda, s3, SAM and other `band of brothers`, I finally settled on your suggestion. Life has been great since then. – mnm Oct 14 '19 at 06:52
  • This is the first and the only solution that actually worked for me.Thanks!!! – donkz Feb 20 '20 at 21:41
  • Thanks for this. I've spent **hours** trying different things and this is the only solution that's worked for me. I think it's really important that 1) you're using the same OS that AWS Lambda is using, i.e. you're doing everything in Cloud9, 2) same Python version in the Lambda vs. Cloud9. I also made sure that I had the latest version of `pip`: `sudo python -m pip install --upgrade pip`. – Matt Sosna Mar 25 '21 at 13:05
2

To use pandas in an AWS Lambda environment with a Python runtime, the simplest way is to:

  • Scroll down in Code tab of Lambda console
  • "Add Layer"
  • Choose "AWS Layers" (choose a layer from a list of layers provided by AWS)
  • AWSDataWrangler-Python37 (which includes pandas)

If you're working with automation for the deployment, you'll want to find the ARN. Even in the console, you'll probably want to just take the extra step and choose the "Specify an ARN" option, since it seems like the dropdown menu for "AWS layers" isn't caught up with the latest AWS Layers that are actually available.

You can find a library of all the AWS-hosted Lambda layers at https://serverlessrepo.aws.amazon.com/applications. If you search for pandas here, you'll find 15 results at the moment, including aws-sdk-pandas-layer-py3-7 (3.8 and 3.9 are there too). If you click into the detail page for that layer, you'll see the arn for that AWS-hosted layer.

You can also just click "Deploy" on that details page. Then the layer will be available to you from the "Custom layers" dropdown in your Lambda console.

Yann Stoneman
  • 953
  • 11
  • 35
1

I found this github repo that has pre-built package ARNs. Find the one you want for your AWS region, and then choose "Specify an ARN" when creating the layer and paste in the layer ARN from this github repo:

https://github.com/keithrozario/Klayers

Johnny Wales
  • 467
  • 2
  • 5
  • 15
0

For the layer approach, make sure that the contents of the uploaded layer package (the site-packages generated by virtualenv) are contained inside a folder/directory named python. Like after you unzip the package it should create a python named directory and have contents of site-packages.

 cd venv/lib/python3.6
 mkdir python
 cp -r site-packages/* python
 zip layer.zip python

For the approach with the dependencies in the same zip, this structure will be different.

Sahishnu Patil
  • 119
  • 1
  • 5
  • your answer lacks both content and direction. Care to elaborate how it's related to the Question asked? An alternative approach/solution is always welcome, but it should be detailed and put in context. – mnm Sep 05 '21 at 10:45
  • It is quiet relatable as the naming of directory is important while creating layers in aws lambda and the same step is missing in the steps followed in the description. The answer is one of the possible reasons behind the error. – Sahishnu Patil Sep 06 '21 at 05:10