My current Situation:
I currently have a Python script that fetches data via HTTP endpoints and calculates and generates hundreds/thousands of reports daily. Currently it runs on an AWS EC2 instance where a queue is used to split the reports it needs to generate across four threads. Four at a time, the script fetches data, computes each report, and saves it to a PostgreSQL Amazon RDS.
The Problem:
As the project scales, my script won't be able to compute fast enough and won't be able to generate all the reports it needs in a day with the current method.
Looking For a Solution:
I stumbled across Amazon Lambda but I haven't found anyone using it for a use case similar to mine. My plan would be to upload/put each report needed to be generated into it's own S3 bucket then have the Lambda Function trigger when the bucket is created. The Lambda function would do all the data fetching (from HTTP endpoints) and all the calculations and save it to a row in my PostgreSQL Amazon RDS. In theory, this would make everything parallel and would eliminate the need for a queue waiting for resources to be freed up.
Basically I am looking for a solution to make sure my script is able to run daily and finish each day without over-running into the next day.
My Questions:
Would Amazon Lambda be suitable for something like this?
Would it be costly to do something like this with Amazon Lambda (creating hundreds/thousands of s3 buckets a day)?
Is there better options?
Any help, recommendations, insight, or tips is greatly appreciated. Thanks!