37

I am designing an application for which input is a large text file (size ranges from 1-30 GB) uploaded to S3 bucket every 15 min. It splits the file into n no of small ones and copy these files to 3 different S3 buckets in 3 different aws regions. Then 3 loader applications read these n files from respective s3 buckets and load the data into respective aerospike cluster.

I am thinking to use AWS lambda function to split the file as well as to load the data. I recently came across AWS step function which can also serve the purpose based on what I read. I am not sure which one to go with and which will be cheaper in terms of pricing. Any help is appreciated.

Thanks in advance!

Community
  • 1
  • 1
dhamu
  • 605
  • 1
  • 7
  • 17

2 Answers2

49

Lambda and Step functions are like floors and steps to each floor. You cannot replace one with another.

Lambda is computing, steps functions take them to the desired step.

Youtube video explains very well: https://www.youtube.com/watch?v=Dh7h3lkpeP4

To the analogy again, you can have multiple computes (lambda) in a single floor before you pass it on the next floor.

One of the example is as shown below.

Usecase: https://john.soban.ski/transcribe-customer-service-voicemails-and-alert-on-keywords.html

enter image description here

Hope it helps.

Kannaiyan
  • 12,554
  • 3
  • 44
  • 83
  • 2
    -1 for saying `You cannot replace one with another` when it's totally posible to replace Step Functions with a Lambda. It's probably not a good idea in most cases, but you can construct a state machine inside a lambda, and if you care about running costs more than anything else it "might" be an idea to explore: https://www.readysetcloud.io/blog/allen.helton/lambda-vs-step-functions-breakdown/ – Mariano Desanze Jun 01 '22 at 19:45
24

Step functions are excellent at coordinating workflows that involve multiple predefined steps. It can do parallel tasks and error handling well. It mainly uses Lambda functions to perform each task.

Based on your use-case, step functions sound like a good fit. As far as pricing, it adds a very small additional charge on top of Lambdas. Based on your description, I doubt you'd even notice the additional cost. You'd need to evaluate that based on the number of "state transitions" you would be using. Of course, you'll also have to pay for your Lambda invocations.

Justin Howard
  • 5,504
  • 1
  • 21
  • 48
  • Thank you for your explanation. In my initial design I was thinking to trigger lambda whenever an object is created in S3. e.g. When a large file is posted to S3 bucket, it will trigger lambda that will split this file into n files and will copy them to 3 different s3 buckets. Then a loader lambda function associated with each of the 3 buckets will be triggered whenever a file is copied to that bucket. So if there n files, total no of lambda invocations in the second step will be n*3 . Now, can this be done in a better way by adding a step function? – dhamu Jan 11 '19 at 16:08
  • 6
    If you don't need to coordinate lambda functions and communicate between them, you don't need step functions. I can't say for your specific use-case without knowing all the details. I suggest you take a look at some example use-cases of step functions. If you don't need it, don't add complexity for no benefit. – Justin Howard Jan 11 '19 at 17:02