5

Background

I have an API Gateway endpoint, which proxies to a Lambda function (Lambda A), for my React application to fetch customer data.

This lambda function makes an API call to fetch the customer data, but the format of the response leaves a lot to be desired. So I want to reformat it.

Rather than stuff this reformatting logic into Lambda A, I wrote a separate Lambda function (Lambda B). I need to invoke both of these functions when my API Gateway endpoint is hit, and the output from the first is the input to the second.

First thought: Step Functions

Step functions seemed like a natural fit, but there is a 32kb limit on the size of the data payload that can be passed between stages. Our json blob of customer data often exceeds this.

The only "best practice" I've heard offered for this situation is to write the payload to S3, and just pass the object key to the next stage.

This is fine, but I'm not thrilled about having to write and delete so many short lived objects to S3. There may be dozens or hundreds of thousands of these requests per day. So I've abandoned the step function approach (for now).

Current Approach

I'm currently invoking Lambda B directly from Lambda A using the javascript SDK. This has a fair amount of downside; notably that I'm running two lambdas concurrently at times with no performance benefit. In other words, I'm paying for Lambda A to just sit there and wait on the response from Lambda B (which I'm also paying for).

It feels like an anti-pattern, and I've heard it characterized as such.

The Question(s)

This seems like a relatively common scenario - make an API call (function A), and then execute some additional logic to supplement, reformat, or otherwise modify that response (function B), before passing it back to the caller.

Surely I'm not the first person to want to use two Lambda functions to do something like this.

  • What are my options for doing this with two lambda functions, assuming I can't use step functions?

  • Are there other ways to work around Step Functions' 32kb payload size limit besides using S3?

  • If I'm silly for wanting to avoid the S3/Step Function approach, answers explaining why my concerns are unwarranted would also be welcome.

Edit

Why do you even consider splitting the functionality of fetching the data and processing it into two different AWS Lambda functions?

Imagine that instead of just Lambda A, I have two dozen Lambdas that need to consume the functionality of Lambda B.

So, I package (the functionality of) Lambda B up, publish it to Nexus, and my other two dozen Lambdas all consume it at build time. All my lambdas swell in size, and I have to publish more npm packages as I accumulate more "Lambda B"s. This is what I want to avoid.

I want my "Lambda A"s to consume other lambdas, rather than npm packages, for widely shared functionality. Maybe I am taking the "function" in "lambda function" too literally, or maybe I'm just trying to leverage FaaS to its full potential.

Mike Patrick
  • 10,699
  • 1
  • 32
  • 54
  • 1
    Is your requirement even a synchronous response to a request? I wonder how that would work with Step Functions, unless you trigger the Step Function from within an AWS Lambda Function which would result in same downsides of triggering an AWS Lambda Function instead. – Dunedan Dec 20 '17 at 21:27
  • Yes, it's synchronous in the sense that Lambda B can't start until Lambda A is done, and the caller needs the result of Lambda B's computation. There is no need for a lambda to trigger the step function; API Gateway can do this directly. The step function would in turn execute the two lambdas **in sequence**, returning the result of Lambda B to the caller. – Mike Patrick Dec 20 '17 at 22:24
  • 1
    When triggering a Step Function right through API Gateway, don't you just get the ARN of the execution in the response instead of the result of the Step Function? At least that's what https://docs.aws.amazon.com/step-functions/latest/dg/tutorial-api-gateway.html suggest. – Dunedan Dec 21 '17 at 06:34
  • @Dunedan great comment - you're absolutely correct. Step functions won't work for my use case, not because of any data limit, but because of their asynchronous nature. None of [the workarounds](https://stackoverflow.com/a/47910158/2999566) are very attractive. For future readers, [this take on step functions](https://serverless.zone/faas-is-stateless-and-aws-step-functions-provides-state-as-a-service-2499d4a6e412) does a good job of explaining what I was _hoping_ to get out of step functions. Thanks again for your valuable input. – Mike Patrick Dec 21 '17 at 11:36

2 Answers2

3

From your question I can read the following requirements:

  • you need an AWS Lambda function (behind API Gateway) to act as an API-endpoint for a client application
  • your AWS Lambda function has to fetch data from a backend system and process it for consumption by the client application
  • such requests are synchronous and the faster they get answered the better (and cheaper of course)
  • the logic you need to run isn't too complicated and takes probably just a few milliseconds to execute

Why do you even consider splitting the functionality of fetching the data and processing it into two different AWS Lambda functions? Don't take the "function" in "AWS Lambda function" too literal: The code you run in a AWS Lambda function can be as complex as it needs to be. Just run everything in a single AWS Lambda function and split the code logically. That's the most efficient and clean way possible.

Dunedan
  • 7,848
  • 6
  • 42
  • 52
  • This is reasonable input and I appreciate it. Condensing the two functions into one is certainly the first thing I considered, and I may end up going that way. However, I believe I have good reasons to explore my options for keeping them separate, which is what I'm trying to do with this question. Upvote for "this answer is useful" nonetheless. – Mike Patrick Dec 20 '17 at 20:09
0

You didn't tell how big is your payload between processes A and B. But if it's under 250Kb I would suggest to setup an intermediate SQS queue where process A publish results and process B is triggered by new messages in the queue.

Veilkrand
  • 325
  • 1
  • 3
  • 16