1

The problem I have is similar to this SO question but the answer doesn't work for me. I am trying to import Python library (let's say xgboost) from /tmp folder in AWS Lambda.

Library requests is added to Lambda layer and what I did is:

import json
import io
import os
import zipfile
import requests
import sys

sys.path.insert(0, '/tmp/')
sys.path.append('/tmp/')

os.environ["PYTHONPATH"] = "/var/task"

def get_pkgs(url):
    print("Getting Packages...")
    re = requests.get(url)
    z = zipfile.ZipFile(io.BytesIO(re.content))
    print("Extracting Packages...")
    z.extractall("/tmp/")
    print("Packages are downloaded and extracted.")
    
def attempt_import():
    print("="*50)
    print("ATTEMPT TO IMPORT DEPENDENCIES...")
    print("="*50)
    import xgboost
    print("IMPORTING DONE.")
    
def main():
    URL = "https://MY_BUCKET.s3.MY_REGION.amazonaws.com/MY_FOLDER/xgboost/xgboost.zip"

    get_pkgs(URL)
    attempt_import()
    
def lambda_handler(event, context):
    main()
    return "Hello Lambda"

The error I get is [ERROR] ModuleNotFoundError: No module named 'xgboost'. I gave my S3 bucket all necessary permissions, and I am positive that Lambda can access the .zip file since the requests.get works and variable z returns:

<zipfile.ZipFile file=<_io.BytesIO object at 0x7fddaf31c400> mode='r'>

Makaroni
  • 880
  • 3
  • 15
  • 34
  • Downloading packages in the lambda execution is wasting your $. Instead, you should either package your dependencies into the deployment package or build a lambda layer. – jellycsc Mar 10 '21 at 17:00
  • @jellycsc The issue is I already have multiple packages in 5 layers, that are close to 260MB which is the limit. Temporary lambda folder has additional space of 512MB so this solution can work to me. – Makaroni Mar 10 '21 at 17:12
  • Ok, I see. You can try EFS integration then. – jellycsc Mar 10 '21 at 17:14
  • @jellycsc Do you mean Sagemaker? Can you send some reference/material? – Makaroni Mar 10 '21 at 17:16
  • 1
    No, here is what I mean. https://aws.amazon.com/blogs/compute/using-amazon-efs-for-aws-lambda-in-your-serverless-applications/ – jellycsc Mar 10 '21 at 17:32

2 Answers2

1

You could try using the boto3 library to download the file from S3 to /tmp directory as explained in https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.download_file

import boto3
s3 = boto3.resource('s3')
s3.meta.client.download_file('mybucket', 'hello.txt', '/tmp/hello.txt')
Naman
  • 296
  • 1
  • 10
  • This could be the solution, but the issue is when you try to import large python package (such as `xgboost`), downloaded `.zip` file and its extracted folder in `tmp` directory are larger than 500MB which results in: `[ERROR] OSError: [Errno 28] No space left on device` – Makaroni Mar 20 '21 at 14:31
  • `tmp` directory or ephemeral storage of Lambda function can be expanded up to 10GB now. – dsumsky Jul 20 '22 at 06:51
0

Actually, my code above works and I had a rather silly error. Instead of zipping the xgboost package folders (xgboost, xgboost.libs and xgboost.dist-info) I actually zipped their parental folder which I named package-xgoboost, and that didn't worked in AWS lambda. Be sure that you actually zip those 3 folders directly.

Also, make sure your xgboost library is up-to-date. Previously I used version 1.2.1 which didn't work either. Upgrading the library and zipping the newest xgboost version (in my case 1.3.3) finally worked.

Makaroni
  • 880
  • 3
  • 15
  • 34