Processing huge files using AWS S3 and Lambda

Question

I am trying to decrypt files that arrives periodically into our s3 bucket.How can I process if the file size is huge (eg 10GB) ,since the computing resources of Lambda is Limited. Im not sure if it is necessary download the whole file into Lambda and perform the decryption or is there some other way we can chunk the file and process?

Edit :- Processing the file here includes decrypting the file and the parse each row and write it to persistent store like a SQL queue or Database.

A question should contain the code you have used so far and explain what problem it has. This is too broad a question now. But typically you would not read 10GB into memory — Ruan Mendes, Jan 28 '20 at 18:10
I haven't started writing any code but would like to implement the solution using S3 and Lambda.The unknown fact for me here is can we process a file from S3 in blocks of buffer instead of downloading the whole file.I was referring to this answer in particular when I say dividing the file into blocks https://stackoverflow.com/questions/34447037/encrypting-a-large-file-with-aes-using-java/39336692 — ArjunPunnam, Jan 28 '20 at 18:16
Then please edit your question to include that code, a link to the other question and the code you have tried. You do have to try something first. — Ruan Mendes, Jan 28 '20 at 19:19
@Nisantasi What do you mean by "processing"? What will the program be doing with the data? If it is just counting the occurrences of certain keywords, then the file size would not matter. However, if it is storing information from the file, then it would matter. Please edit your question to explain the actual flow you are wanting to achieve (eg decrypt, processing, calculating, returning result, etc). — John Rotenstein, Jan 28 '20 at 23:53
@John Thanks for your comment I have updated the question based on your inputs. — ArjunPunnam, Jan 29 '20 at 16:07
This is very valuable information , as I am thinking to decrypt the file in Chunks but one of the problem I am thinking about how can I save the decrypted chunk, since s3 doesn't support appending to the same object and the disk space on Lambda isnt necessary for files greater than 512 mb. — ArjunPunnam, Jan 29 '20 at 18:27
I can decrypyt the file in chunks and write that decrypted data to a Queue or a Database but if the file is huge I might exceed the LambdaTimeout limit. — ArjunPunnam, Jan 29 '20 at 18:29
How often does the file arrive? Another option is to simply have a small Amazon EC2 instance running that can do the processing without the limitations of an AWS Lambda function. That would provide plenty of disk space to decrypt and process the file. — John Rotenstein, Jan 29 '20 at 22:08

score 0 · Accepted Answer · answered Feb 12 '21 at 02:56

You can set the byte-range in the GetObjectRequest to load a specific range of bytes from an S3 object.

The following example comes from the AWS official documentation on S3 GetObject API:

// Get a range of bytes from an object and print the bytes.
            GetObjectRequest rangeObjectRequest = new GetObjectRequest(bucketName, key).withRange(0, 9);
            objectPortion = s3Client.getObject(rangeObjectRequest);
            System.out.println("Printing bytes retrieved.");
            displayTextInputStream(objectPortion.getObjectContent());

For more information, you can visit the documentation here: https://docs.aws.amazon.com/AmazonS3/latest/userguide/download-objects.html

Processing huge files using AWS S3 and Lambda

1 Answers1