61

Here is the simplified scheme I am trying to make work:

http requests --> (Gateway API + lambda A) --> SQS --> (lambda B ?????) --> DynamoDB

So it should work as shown: data coming from many http requests (up to 500 per second, for example) is placed into SQS queue by my lambda function A. Then the other function, B, processes the queue: reads up to 10 items (on some periodical basis) and writes them to DynamoDB with BatchWriteItem.

The problem is that I can't figure out how to trigger the second lambda function. It should be called frequently, multiple times per second (or at least once per second), because I need all the data from the queue to get into DynamoDB ASAP (that's why calling lambda function B via scheduled events as described here is not a option)


Why don't I want to write directly into DynamoDB, without SQS?

That would be great for me to avoid using SQS at all. The problem that I am trying to address with SQS is DynamoDB throttling. Not even throttling itself but the way it is handled while writing data to DynamoDB with AWS SDK: when writing records one by one and getting them throttled, AWS SDK silently retries writing, resulting in increasing of the request processing time from the http client's point of view.

So I would like to temporarily store data in the queue, send response "200 OK" back to client, and then get queue processed by separate function, writing multiple records with one DynamoDB's BatchWriteItem call (which returns Unprocessed items instead of automatic retry in case of throttling). I would even prefer to lose some records instead of increasing the lag between a record being received and stored in DynamoDB

UPD: If anyone is interested, I have found how to make aws-sdk skip automatic retries in case of throttling: there is a special parameter maxRetries. Anyway, going to use Kinesis as suggested below

Cœur
  • 37,241
  • 25
  • 195
  • 267
xtx
  • 4,306
  • 4
  • 28
  • 30
  • Whats the reason why it wouldn't work out - Count on the Number scheduled event Source ? – Naveen Vijay Jan 08 '16 at 14:18
  • 1
    Maybe I am missing something, but from what I see in the [docs](http://docs.aws.amazon.com/lambda/latest/dg/with-scheduled-events.html) minimum scheduling rate is 5 minutes. But I need to be able to run my function each second. I'm totally new to all this stuff, so please advice. – xtx Jan 08 '16 at 15:10
  • Understand the latency of 5 mins. If thats the case, then you have to fall back on using a custom process / script running in EC2 which connects the items in SQS to DynamoDB. I am also thinking about the AWS Data Pipeline which has the schedule option accepting a text box of value 1 and combo of min as time frequency. – Naveen Vijay Jan 08 '16 at 15:18
  • Yes, custom process in EC2 instead of lambda function would solve all problems, that would be my last fallback option. – xtx Jan 08 '16 at 16:22
  • 2
    I've tried to accomplish this sort of thing in Lambda before as well. I gave up. Any integration of SQS with Lambda currently feels like a huge hack. I just have a service running on a t2.nano instance continually polling SQS for now. Hopefully Amazon will add some form of integration between SQS and Lambda in the future. If you aren't averse to looking outside AWS, you might look at IronIO which has IronMQ/IronWorker integration that looks similar to what I wish AWS had. – Mark B Jan 08 '16 at 16:55
  • AWS added native support on June 28, 2018: https://aws.amazon.com/blogs/aws/aws-lambda-adds-amazon-simple-queue-service-to-supported-event-sources/ – Trenton Jun 28 '18 at 22:39

8 Answers8

89

[This doesn't directly answer your explicit question, so in my experience it will be downvoted :) However, I will answer the fundamental problem you are trying to solve.]

The way we take a flood of incoming requests and feed them to AWS Lambda functions for writing in a paced manner to DynamoDB is to replace SQS in the proposed architecture with Amazon Kinesis streams.

Kinesis streams can drive AWS Lambda functions.

Kinesis streams guarantee ordering of the delivered messages for any given key (nice for ordered database operations).

Kinesis streams let you specify how many AWS Lambda functions can be run in parallel (one per partition), which can be coordinated with your DynamoDB write capacity.

Kinesis streams can pass multiple available messages in one AWS Lambda function invocation, allowing for further optimization.

Note: It's really the AWS Lambda service that reads from Amazon Kinesis streams then invokes the function, and not Kinesis streams directly invoking AWS Lambda; but sometimes it's easier to visualize as Kinesis driving it. The result to the user is nearly the same.

Eric Hammond
  • 22,089
  • 5
  • 66
  • 75
  • 2
    This is great. I've been having huge issues with SNS throttling going from a Spark cluster to S3 to Lambda (since AWS upped our Lambda functions from 100 to 6,000!) and been doing all sorts of things trying to constrain the rate at which we deliver messages. Your solution of using Lambda with Kinesis seems just what I'm looking for. – cfeduke Aug 05 '16 at 19:40
  • 1
    Hi Eric, I like this approach, I have just one question: is it possible to batch messages in time batches? say 1 hour, and then pass the batch to Lambda? – luisfarzati Sep 27 '16 at 00:45
  • 1
    @luisfarzati Drop the messages into a Kinesis stream where they will buffer up. Schedule an AWS Lambda function to run once an hour. It reads the messages off the steam and processes them. Not that the AWS Lambda function needs to complete its work in a maximum of 5 minutes for current limitations. – Eric Hammond Sep 27 '16 at 09:12
  • 1
    Kinesis is good but only if you have the $$$ - it's a bit expensive. – bjfletcher Mar 21 '17 at 10:20
  • 5
    The problem is that Kinesis streams don't have the high availability "atleast once" guarantee for each message that SQS has. It's acceptable to lose a few messages in the firehouse when processing big data, for example, but it's not acceptable to lose a single request in certain applications. – Isa Hassen Apr 30 '17 at 21:00
  • 1
    Best approach, but a lot more expensive than SQS. My manager would have to fire me to afford AWS ecosystem... lol – TriCore May 16 '17 at 04:21
  • 2
    AWS pricing is always changing (downwards), but I'm confused by the cost comments above. For most uses like those being discussed, Amazon Kinesis should be noticeably cheaper than Amazon SQS. – Eric Hammond May 21 '17 at 22:51
  • Kinesis is more expensive if you need to handle bursty processing every now and then. Its cheaper if you have constant data coming in – Gareth McCumskey Jun 08 '18 at 08:57
21

You can't do this directly integrating SQS and Lambda, unfortunately. But don't fret too much yet. There is a solution! You need to add another amazon service into the mix and all your problems will be solved.

http requests --> (Gateway API + lambda A) --> SQS + SNS --> lambda B --> DynamoDB

You can trigger an SNS notification to the second lambda service to kick it off. Once it is started, it can drain the queue and write all the results into DynamoDB. To better understand possible event sources for Lambda check out these docs.

jb.
  • 9,921
  • 12
  • 54
  • 90
Chris Franklin
  • 3,864
  • 2
  • 16
  • 19
  • Thanks for suggesting! I thought about using SNS to call the second lambda function, but was confused by the fact that lambda B would be called for each incoming http request (in my case appox. 500 per second) though I'd like it to be called ideally once a second. I worried about that overhead: 500 times per second vs. one time that was actually needed. So I'm going to look into Kinesis as @Eric Hammond suggested – xtx Jan 15 '16 at 10:46
  • I keep forgetting about Kinesis. We use Kafka for our event stream (which Kinesis is based on). That is a much better solution than using SNS in your case! – Chris Franklin Jan 15 '16 at 14:24
  • You cannot subscribe a sns to a sqs. – khebbie Mar 18 '16 at 12:51
  • 4
    I didn't say it could. I was suggesting the lambda trigger the SNS at the same time as putting the message in SQS. Since the question was how to processes events in SQS with lambda I didn't think I had to restate that. – Chris Franklin Mar 23 '16 at 15:27
  • 1
    This works a treat - I had the issue of not being able to subscribe to the SNS directly as im using a FIFO SQS, which isn't currently supported. In the callback of adding to the SQS queue, I am now publishing to the SNS topic and have the 2 lambda subscribe to that SNS topic. Great, thanks! – andy mccullough Jan 29 '18 at 14:27
  • This is a great solution - it gets around the fact that using Lambda functions to poll on SQS (especially long-polling) is not a great design, by telling the Lambda function "poll now, message is on the way". A bit like phoning someone to tell them to check their email ASAP! – RichVel Feb 04 '18 at 12:46
18

As of June 28, 2018, you can now use SQS to trigger AWS Lambda functions natively. A workarounds is no longer needed!

https://aws.amazon.com/blogs/aws/aws-lambda-adds-amazon-simple-queue-service-to-supported-event-sources/

And in Nov 2019, support for FIFO queues was added:

https://aws.amazon.com/blogs/compute/new-for-aws-lambda-sqs-fifo-as-an-event-source/

Trenton
  • 11,678
  • 10
  • 56
  • 60
  • 5
    At the moment the Lambda triggers only work with standard queues and not FIFO queues. – Brett Jul 09 '18 at 22:39
  • 1
    Brett, your comment is the sort of reason I come to Stack Overflow before official docs. (Like [this aws doc](https://docs.aws.amazon.com/lambda/latest/dg/invoking-lambda-function.html#supported-event-source-sqs) which doesn't mention it.) Is there any SO mechanism for expediting a newly-true answer to the top? – jameslol Jul 18 '18 at 23:28
10

Another solution would be to just add the item to SQS, call the targeted Lambda function with Event so it is asynchronous.

The asynchronous Lambda can then get from SQS as many item as you want and process them.

I would also add a scheduled call to the asynchronous Lambda to handle any items in the queue that was in error.

[UPDATE] You can now setup Lambda trigger on new message on queue

loopingz
  • 1,149
  • 17
  • 19
  • I like this solution. In my case the database might be down, so having lambda b run on a schedule is good for retrying to process the queue. What I haven't figured out is, how would you scale up to multiple lambda b consumers if the queue starts to get busy? – mozey Apr 11 '17 at 16:03
  • You can setup an Alarm on Cloudwatch that trigger a SNS which trigger the Lambda that pull the SQS, this way you increase the number of poller. You can also create a specific Lambda that will launch more than one Lambda asynchronously based on the number of items available – loopingz Apr 11 '17 at 22:04
  • Question for anyone doing this: do you find that CloudWatch events are delivered on time? With rate(5 mins) or cron(5/* etc), I've observed between 0 and 3 messages per 5 minute window. Not ideal! – jelder Feb 28 '18 at 17:25
  • The SQS Lambda trigger has been added since this question – loopingz Jun 27 '18 at 13:23
7

Maybe a more cost-efficient solution would be to keep everything in the SQS (as it is), then run a scheduled event that invokes a multi-threaded Lambda function that processes items from the queue?

This way, your queue worker can match your limits exactly. If the queue is empty, function can finish prematurely or start polling in single thread.

Kinesis sounds a like an over-kill for this case – you don't need the original order, for instance. Plus running multiple Lambdas simultaneously is surely more expensive than running just one multi-threaded Lambda.

Your Lambda will be all about I/O, making external calls to AWS services, so one function may fit very well.

Denis Mysenko
  • 6,366
  • 1
  • 24
  • 33
1

Here's how I collect messages from an SQS queue:

package au.com.redbarn.aws.lambda2lambda_via_sqs;

import java.util.List;

import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestHandler;
import com.amazonaws.services.lambda.runtime.events.SQSEvent;
import com.amazonaws.services.lambda.runtime.events.SQSEvent.SQSMessage;

import lombok.extern.log4j.Log4j2;

@Log4j2
public class SQSConsumerLambda implements RequestHandler<SQSEvent, String> {

    @Override
    public String handleRequest(SQSEvent input, Context context) {

        log.info("message received");

        List<SQSMessage> records = input.getRecords();

        for (SQSMessage record : records) {
            log.info(record.getBody());
        }

        return "Ok";
    }
}

Add your DynamoDB code to handleRequest() and Lambda B is done.

Peter Svehla
  • 54
  • 1
  • 4
0

Here's my solution to this problem:

HTTP request --> DynamoDb --> Stream --> Lambda Function

In this solution, you have to set up a stream for the table. The stream is handled with a Lambda function that you'll write and that's it. No need to use SQS or anything else.

Of course, this is a simplified design and it works only for simple problems. For more complicated scenarios, use Kinesis (as mentioned in the other answers).

Here's a link to AWS documentation on the topic.

Mehran
  • 15,593
  • 27
  • 122
  • 221
0

I believe AWS had now come up with a way where SQS can trigger a lambda function. So I guess we can use SQS for smoothening burst loads of data to dynamo incase you don't care about the order of messages. Check their blog on this new update: https://aws.amazon.com/blogs/aws/aws-lambda-adds-amazon-simple-queue-service-to-supported-event-sources/

Dipayan
  • 203
  • 4
  • 12