0

I have multiple requests from different projects, each with S3 paths that need to be copied to a destination.

I am currently using a single-host multithreading function to process the requests, and I want to move to a serverless approach using Lambda.

I plan to use

  • a batching Lambda to split the source files into smaller batches and sends multiple messages to SQS queue,
  • a Processor Lambda triggered by above SQS queue, and with auto-scaling/concurrency enabled, to transfer the files asynchronously,
  • a Consolidation Lambda to generate reports, consolidate errors and sends a notification for each request.

I need help with triggering the Consolidation Lambda after completion of each request and suggestions on where to store data for report generation - Eg. Number of files requested, transferred, skipped

Questions:

  1. How can I trigger the Consolidation Lambda after each request is completed?

    • Since the processing Lambda is triggered by SQS, if there are messages from another project, the Lambda will still be running for another request even after one request is complete. So, I'm not sure if any orchestration tool can help.?
    • The requests can be concurrent and overlapping.
  2. Any suggestions on where the data from the function has to be stored for generating the report? I'm currently thinking of using another queue/topic to store the data, and read from there when consolidation lambda is triggered.

  3. Any shortcomings/challenges in the plan? What could be better approach?

Below is the message I'm thinking for SQS

{
    'source_bucket': source_bucket,
    'source_prefix': source_prefix,
    'destination_bucket': destination_bucket,
    'destination_prefix': destination_prefix,
    'files': current_batch_list,
    'metadata': metadata,
    'batch_num': current_batch_num,
    'request_trigger_time': request_trigger_time
}
John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
Hemanth
  • 159
  • 1
  • 12

1 Answers1

0

I haven't tried it myself, but it looks like you can use the AWS Step Functions Map capability.

From Map - AWS Step Functions:

Use the Map state to run a set of workflow steps for each item in a dataset. The Map state's iterations run in parallel, which makes it possible to process a dataset quickly.

Thus, the 'batching' step could trigger a Map, which causes the 'processor' steps to run in parallel. Then, when the processor steps are all complete, the 'consolidation' step can occur as the next task.

See: Parallel States Merge the output in Step Function

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470