43

Quick question: Is it possible to trigger the execution of a Step Function after an SQS message was sent?, if so, how would you specify it into the cloudformation yaml file?

Thanks in advance.

Carlos
  • 855
  • 2
  • 9
  • 18

2 Answers2

66

The first think to consider is this: do you really need to use SQS to start a Step Functions state machine? Can you use API gateway instead? Or could you write your messages to a S3 bucket and use the CloudWatch events to start a state machine?

If you must use SQS, then you will need to have a lambda function to act as a proxy. You will need to set up the queue as a lambda trigger, and you will need to write a lambda that can parse the SQS message and make the appropriate call to the Step Functions StartExecution API.

I’m on mobile, so I can’t type up the yaml right now, but if you need it, I can try to update with it later. For now, here is detailed walkthrough of how to invoke a Step Functions state machine from Lambda (including example yaml), and here is walkthrough of how to use CloudFormation to set up SQS to trigger a Lambda.

Matthew Pope
  • 7,212
  • 1
  • 28
  • 49
  • This was the best approach. Thanks a lot. – Carlos Jul 25 '19 at 06:58
  • 3
    With this approach, how you can handle errors on step functions? Because, when the message is received in the lambda function, the message will be deleted from SQS. And if an error is raised on Step Function, neither SQS nor Lambda Function will be notified. I am assuiming that in the most cases errors will appears on step function and not in the lambda that call it – Vladimir Venegas Jul 30 '19 at 16:11
  • 2
    The SQS message is not deleted until the message is successfully processed (ie. Step Function started) by the Lambda Function. Inside your Step Function, you would use the same approach for error handling as in any other Step Function. https://docs.aws.amazon.com/step-functions/latest/dg/concepts-error-handling.html – Matthew Pope Jul 30 '19 at 16:17
  • Is there a different way if the Step function is long running (say 30-90 mins) and the integration should be synchronous. i.e. SQS message must be deleted only after SFN completes. Wish SQS provides direct sync integration with SFN without the lambda as lambda cannot run for more than 15 mins. – Naveen Santhanavel Jun 06 '21 at 03:18
  • @NaveenKumar deleting the message after a long step function completes sounds like an anti pattern because SQS is a queuing mechanism, not a state tracker. I think you may want to ask a new question, and beware of the XY problem when you do. – Matthew Pope Jun 22 '21 at 01:24
  • 1
    @MatthewPope, why would it be an anti-pattern? Looks like a reasonable approach if you want the Step Function to acknowledge/retry the message in case of success/failure, in the same way a Lambda would do. In many cases, retrying individual steps in the SFN may be enough, but I can imagine some other cases where you would want to retry the entire SFN. – cahen Jul 12 '21 at 12:54
  • 1
    Looking at the first question in this answer - do I really need SQS to start a step function, I wonder if rate limiting is a good reason, say you want to limit the number of concurrent sfn executions, e.g. a scraper that avoids bombarding the website, ... etc. Would be great to have some criteria to consider. – Shawn Jul 20 '21 at 06:47
  • 2
    If your step function completes in reasobably fast times (<5 minutes) you can use an express step function as illustrated here: https://docs.aws.amazon.com/step-functions/latest/dg/sample-project-express-high-volume-sqs.html With this approach your jobs will be deleted from the queue only if the step function succeeds. Make sure to increase SQS VisibilityTimeout if the step function lasts more than 30secs: https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-visibility-timeout.html . This will help you to avoid triggering the same execution over and over – Luciano Mammino Apr 07 '22 at 17:42
  • 1
    @MatthewPope The message is deleted from the queue __only when explicitly deleted__ from the queue. I only comment here because your comment sounded as if successful completion of the lambda automagically deletes the message. When you are done processing the message, you must explicitly delete it otherwise it becomes visible on the queue again. When a message is retrieved from queue it is only _hidden_ from anybody reading more messages from the queue. It is visible again if the value set for its visibility timeout is reached before it is deleted from queue. – Sinux1 Apr 26 '22 at 19:59
17

EventBridge Pipes (launched at re:Invent 2022) allows you to trigger Step Functions State Machines without need for a Lambda function.

https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-pipes.html

You can find an example here:

https://github.com/aws-samples/aws-stepfunctions-examples/blob/main/sam/demo-trigger-stepfunctions-from-sqs/template.yaml

Justin Callison
  • 1,279
  • 2
  • 6
  • This is the way to go nowadays. I was concerned that it might be expensive, but I checked [pricing](https://aws.amazon.com/eventbridge/pricing/#:~:text=5.00%20per%20month-,Pipes,-Amazon%20EventBridge%20Pipes) and it's not expensive at all. I'd be surprised if it cost any more than using Lambda in the middle. – trademark Jan 17 '23 at 14:34
  • 2
    For anyone looking for an CDK implementation; https://github.com/aws/aws-cdk-rfcs/issues/473 it's still on progress. – Capan Jan 18 '23 at 15:52
  • In addition to @Capan comment, there is still an L1 construct you can use. https://docs.aws.amazon.com/cdk/api/v1/docs/aws-pipes-readme.html – Rizxcviii Feb 10 '23 at 14:01