0

I am creating an aws lambda function which downloads an s3 file and processes it according an event it receives everytime. However i don't want to download the s3 file from s3 everytime. Can anyone suggest me how to download s3 file only once and process the incoming events without having to download the s3 file everytime?

Currently its downloading everytime even if i put the code to dowload from s3 in constructor of the lambdafunctionhandler class

If you make any code references or examples, please use java. Thanks in advance

stallion
  • 1,901
  • 9
  • 33
  • 52

3 Answers3

5

If you run several lamdbas in parallel thee context is not reused. So you need to download the file in all lambdas. For storing files use /tmp/. It has a limit of 512MB.

However, if you run a lamdba after another one, the context probably will be reused and therefore the file will exist. Keep in mind cold boot.

Extracted from AWS Lamdba DOC.

After a Lambda function is executed, AWS Lambda maintains the execution context for some time in anticipation of another Lambda function invocation. In effect, the service freezes the execution context after a Lambda function completes, and thaws the context for reuse, if AWS Lambda chooses to reuse the context when the Lambda function is invoked again. This execution context reuse approach has the following implications:

  • Objects declared outside of the function's handler method remain initialized, providing additional optimization when the function is
    invoked again. For example, if your Lambda function establishes a
    database connection, instead of reestablishing the connection, the
    original connection is used in subsequent invocations. We suggest
    adding logic in your code to check if a connection exists before
    creating one. Each execution context provides 512 MB of additional disk space in the /tmp directory. The directory content remains when the execution context is frozen, providing transient cache that can be used for multiple invocations. You can add extra code to check if the cache has the data that you stored. For information on deployment limits, see AWS Lambda Limits. Background processes or callbacks initiated by your Lambda function that did not complete when the function ended resume if AWS Lambda chooses to reuse the execution context. You should make sure any background processes or callbacks in your code are complete before the code exits.

Example code for downloading an obejct from S3:

AmazonS3 s3client = AmazonS3ClientBuilder
                  .standard()
                  .withRegion(Regions.EU_WEST_1)
                  .build();

        //S3 download file

        GetObjectRequest getObjectRequest = new GetObjectRequest(System.getenv("bucket"), "key");
        s3client.getObject(getObjectRequest, new File("/tmp/example.png")); 

EDIT 1: Lambdas and Serverless in general is not recommended for apps that need to maintain the state between different invocations.

WaterKnight
  • 197
  • 7
  • so i guess database connections gets established everytime as well then if the requests/events to aws lambda are not continuous.. I still feel there should have been a way to maintain state for atleast for aws resources/services. – stallion Dec 02 '19 at 10:52
  • 1
    Yes, a database connection is established each time. Lambda functions are not recommended for apps that needs to maintain state. – WaterKnight Dec 02 '19 at 10:59
  • The problem is that the Lambda context is just shared in some subquent request but not in every one. So you could write a code that looks if the file exist and in that case not download again. However, i don´t recommend it becasue Serverless and Lambdas are not used for applications that need to maintain state between invocations. – WaterKnight Dec 02 '19 at 11:07
2

Do you mean, you want to download the file only once while the lambda is warm?.VPC based lambda functions are kept warm for 15 minutes.

If yes, If you call the download function out side of the handler function, the code will only be executed once while the lambda is warm.

Objects declared outside of the function's handler method remain initialized, providing additional optimization when the function is invoked again. For example, if your Lambda function establishes a database connection, instead of reestablishing the connection, the original connection is used in subsequent invocations. We suggest adding logic in your code to check if a connection exists before creating one.

https://docs.aws.amazon.com/lambda/latest/dg/running-lambda-code.html

Arun Kamalanathan
  • 8,107
  • 4
  • 23
  • 39
0

I think you can use static block on Java , that block of code will be executed only once something like below

// code from https://www.geeksforgeeks.org/g-fact-79/
class Test { 
    static int i; 
    int j; 
    static { 
        i = 10; 
        // File download logic here , will be called only once
        System.out.println("static block called "); 
    } 
    Test(){ 
        System.out.println("Constructor called"); 
    } 
} 

class Main { 
    public static void main(String args[]) { 

       // Although we have two objects, static block is executed only once. 
       Test t1 = new Test(); 
       Test t2 = new Test(); 
    } 
} 
cslrnr
  • 694
  • 1
  • 8
  • 24