how to process files based on file size or count in aws?

Question

say, files are being dropped every hour or so in a s3 location. can the files info be pushed to a queue or something similar, such that, those files can be processed by some other aws resources, may be lambda for simplicity here. the files need not be processed individually, but can be processed in a batch, say 100 at a time. is there a way to trigger the job , say once the file count reaches 100 in some queue , that can be maintain as files come through?

One way I think of is enable S3 access logs >> invoke a lambda from the create events >> send metrics to AWS SQS >> Process a batch of them using your favorite tool or service. Please let me know. — hopeIsTheonlyWeapon, Apr 14 '22 at 02:27

Jeremy Thompson · Answer 1 · 2022-04-14T04:18:27.363

There's a few ways you could do this. I'll suggest the simplest, most maintainable and cost effective.

First add an S3 Bucket, Lambda and then a Trigger as documented here: https://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html

Using the example Python code:

s3 = boto3.client('s3')
def lambda_handler(event, context):
    #print("Received event: " + json.dumps(event, indent=2))

    # Get the object from the event and show its content type
    bucket = event['Records'][0]['s3']['bucket']['name']

You want to find out the number of objects, eg 100 and to do that you could use the CLI (there's a number of ways https://stackoverflow.com/a/64486330/495455):

aws s3 ls s3://bucketName/path/ --recursive --summarize | grep "Total Objects:"

Or using the s3 API, in C# Pseudo Code:

var response = s3.ListObjects(new ListObjectsRequest {
    BucketName = "",
    Marker = ""
});

objectsCount = response.S3Objects.Count;

If the Count is > 100 then process the files.

how to process files based on file size or count in aws?

1 Answers1