1

I'm using the boto3 client to get a download link for the zip file of a lambda function on AWS.

I want to transfer this zip file from that download link, directly to an s3 bucket, without storing or piping anything on my machine.

Is it possible to do this with the available AWS apis?

Edit: Can datasync perhaps help with this?

baduker
  • 19,152
  • 9
  • 33
  • 56
user569685
  • 191
  • 1
  • 13
  • Yes, you can upload from stdin https://stackoverflow.com/questions/23553230/awscli-s3-upload-via-stdin – Paolo Jul 25 '23 at 14:58
  • The idea is to avoid as much processing as possible locally. Uploading from stdin would mean that i have to wait for the upload to finish, and my machine would have to stay on potentially for a longer time, which is not an assumption I can make. I'm really looking for a way to do this with api calls only. I want aws to do the gruntwork. – user569685 Jul 25 '23 at 15:00
  • https://stackoverflow.com/a/53701565/1273882 – Ankush Jain Jul 25 '23 at 15:00
  • @AnkushJain Unless I'm missing something, your answer downloads the zip to the machine, which is what i don't want to do. – user569685 Jul 25 '23 at 15:04
  • I don't think lambda can write a function zipfile to s3 directly. You'll have to transfer it between the two yourself. – erik258 Jul 25 '23 at 15:05
  • I know lambda cannot do it. I'm wondering if AWS has a way of downloading the file directly to s3, using just the download link and a few api calls. – user569685 Jul 25 '23 at 15:06
  • 1
    Downloading Lambda functions from the AWS Lambda service is arguably an anti-pattern. The function should have originally been uploaded from some source of truth code repo/pipeline. – jarmod Jul 25 '23 at 17:28
  • 1
    You can stream via a client as shown [here](https://stackoverflow.com/a/60662307/12555857) – rkochar Aug 04 '23 at 12:56
  • @rkochar Thank you for that suggestion but if i go the path of streaming, in my case, I might as well download the file. – user569685 Aug 05 '23 at 20:46

2 Answers2

1

The only way to retrieve your function package is calling the GetFunction , which would return you a link valid for 10 minutes to download the zip file.

I guess if you really want to take it serverless you will want to build a Lambda function that calls GetFunction, download then S3 PutObject

Then you can invoke the function programatically or use the CLI.

Alexandre Abreu
  • 1,382
  • 1
  • 13
  • 28
  • This is a solution but I'd rather avoid additional architecture. I'm looking to accomplish this with api calls only. I want AWS to do the work of how its handled. – user569685 Jul 26 '23 at 11:14
1

You could use the aws s3 cp - command that can stream from standard input to a specified bucket and key and combine this with aws lambda get-function.

For example, this will stream your function's package directly to S3.

curl $(aws lambda get-function --function-name <YOUR-FUCNTION-NAME> \
    | jq -r ".Code.Location") \
    | aws s3 cp - s3://<YOUR-BUCKET>/<YOUR-FUNCTION-NAME>.zip

curl in this context does not save the file locally. Instead, it streams the data directly to stdout, which is then piped as stdin to the aws s3 cp - command.

Or, if you're using boto3, you could combine it with requests with the stream set to True.

Sample code:

import boto3
import requests
from botocore.exceptions import NoCredentialsError


def stream_lambda_to_s3(
        lambda_function_name: str,
        bucket_name: str,
        s3_key: str,
) -> None:

    lambda_client = boto3.client('lambda')
    s3_client = boto3.client('s3')

    response = lambda_client.get_function(FunctionName=lambda_function_name)

    presigned_url = response['Code']['Location']

    with requests.get(presigned_url, stream=True) as r:
        r.raise_for_status()
        try:
            s3_client.upload_fileobj(r.raw, bucket_name, s3_key)
            print(
                f"Successfully uploaded {lambda_function_name} "
                f"to s3://{bucket_name}/{s3_key}"
            )
        except NoCredentialsError:
            print("Credentials not available")


if __name__ == "__main__":
    function_name = 'YOUR_LAMBDA_FUNCTION_NAME'
    target_bucket = 'YOUR_BUCKET_NAME'
    s3_key = f'{function_name}.zip'
    
    stream_lambda_to_s3(function_name, target_bucket, s3_key)

Or you could use Go Upload Managers capabilities that provide concurrent upload of content to S3 by taking advantage of S3's Multipart APIs.

The code has been adapted from this answer.

I've added:

  • cli flags
  • http.Get to use the resp.Body instead of reading from a local file

To build this, just run:

go build -o stream main.go

And then, use it like this:

./stream --lambda-name YOUR-LAMBDA --bucket-name YOUR-BUCKET --key-name YOUR-NAME.zip

There's an additional flag --region, but this one defaults to eu-central-1. No need to supply it. However, if your region is different, feel free to change it.

package main

import (
    "flag"
    "fmt"
    "github.com/aws/aws-sdk-go/aws"
    "github.com/aws/aws-sdk-go/aws/session"
    "github.com/aws/aws-sdk-go/service/lambda"
    "github.com/aws/aws-sdk-go/service/s3/s3manager"
    "io"
    "net/http"
)

var lambdaName string
var bucketName string
var keyName string
var defaultRegion string

func main() {
    flag.StringVar(
        &lambdaName,
        "lambda-name",
        "",
        "Name of the Lambda function",
    )
    flag.StringVar(
        &bucketName,
        "bucket-name",
        "",
        "Name of the S3 bucket",
    )
    flag.StringVar(
        &keyName,
        "key-name",
        "",
        "Key name for the S3 object",
    )
    flag.StringVar(
        &defaultRegion,
        "region",
        "eu-central-1",
        "AWS Region",
    )
    flag.Parse()

    if lambdaName == "" || bucketName == "" || keyName == "" {
        fmt.Println("All flags are required.")
        return
    }

    var awsConfig *aws.Config
    awsConfig = &aws.Config{
        Region: aws.String(defaultRegion),
    }

    // Get Lambda function details
    sess := session.Must(session.NewSession(awsConfig))
    lambdaService := lambda.New(sess)
    lambdaOutput, err := lambdaService.GetFunction(&lambda.GetFunctionInput{
        FunctionName: &lambdaName,
    })
    if err != nil {
        fmt.Printf("Failed to fetch Lambda function details: %v\n", err)
        return
    }

    resp, err := http.Get(*lambdaOutput.Code.Location)
    if err != nil {
        fmt.Printf("Failed to fetch content from pre-signed URL: %v\n", err)
        return
    }
    defer func(Body io.ReadCloser) {
        err := Body.Close()
        if err != nil {
            fmt.Printf("Failed to close response body: %v\n", err)
        }
    }(resp.Body)

    // Create an uploader with the session and custom options
    uploader := s3manager.NewUploader(sess, func(u *s3manager.Uploader) {
        u.PartSize = 5 * 1024 * 1024
        u.Concurrency = 2
    })

    // Upload the streamed content to S3
    result, err := uploader.Upload(&s3manager.UploadInput{
        Bucket: &bucketName,
        Key:    &keyName,
        Body:   resp.Body,
    })

    if err != nil {
        fmt.Printf("Failed to upload content to S3: %v\n", err)
        return
    }
    fmt.Printf("File uploaded to: %s\n", result. Location)
}

Note on datasync. No, it can't do what you want. From the FAQ:

AWS DataSync supports moving data to, from, or between Amazon Simple Storage Service (Amazon S3), Amazon Elastic File System (Amazon EFS), Amazon FSx for Windows File Server, Amazon FSx for Lustre, Amazon FSx for OpenZFS, and Amazon FSx for NetApp ONTAP.

source

baduker
  • 19,152
  • 9
  • 33
  • 56