29

I read you can have multiple consumer apps per kinesis stream.

http://docs.aws.amazon.com/kinesis/latest/dev/developing-consumers-with-kcl.html

however, I heard you can only have on consumer per shard. Is this true? I don't find any documentation to support this, and can't imagine how that could be if multiple consumers are reading from the same stream. Certainly, it doesn't mean the producer needs to repeat content in different shards for different consumers.

bhomass
  • 3,414
  • 8
  • 45
  • 75

3 Answers3

25

Kinesis Client Library starts threads in the background, each listens to 1 shard in the stream. You cannot connect to a shard over multiple threads, that is by-design.

http://docs.aws.amazon.com/kinesis/latest/dev/kinesis-record-processor-scaling.html

For example, if your application is running on one EC2 instance, and is processing one Amazon Kinesis stream that has four shards. This one instance has one KCL worker and four record processors (one record processor for every shard). These four record processors run in parallel within the same process.

In the explanation above, the term "KCL worker" refers to a Kinesis consumer application. Not the threads.

But below, the same "KCL worker" term refers to a "Worker" thread in the application; which is a runnable.

Typically, when you use the KCL, you should ensure that the number of instances does not exceed the number of shards (except for failure standby purposes). Each shard is processed by exactly one KCL worker and has exactly one corresponding record processor, so you never need multiple instances to process one shard.

See the Worker.java class in KCL source.

az3
  • 3,571
  • 31
  • 31
  • 1
    I understand now. In this case, even if there are multiple instances, they are the same client application. I was think along the line of kafka, where independent applications can read from a single stream. – bhomass Dec 30 '15 at 00:16
  • 5
    @user1058511: You can. Kinsis supports the use-case for multiple applications to consume the same stream concurrently. For example, you have one application that updates a real-time dashboard and another that archives data. You want both applications to consume data from the same stream concurrently and independently. – Leet-Falcon Feb 13 '16 at 09:33
  • 1
    I think I need to rephrase. In case of Kafka, the multiple consumer apps can participate in one consumer group as to not repeatedly process any one message. In that sense, I don't want to say "independent" as suppose to "separate". – bhomass Feb 14 '16 at 19:27
20

Late to the party, but the answer is that you can have multiple consumers per kinesis shard. A KCL instance will only start one process per shard, but you can have another KCL instance consuming the same stream (and shard), assuming the second one has permission.

There are limits, though, as laid out in the docs, including:

Each shard can support up to 5 transactions per second for reads, up to a maximum total data read rate of 2 MB per second.

If you want a stream with multiple consumers where each message will be processed once, you're probably better off with something like Amazon Simple Queue Service.

Cameron Stone
  • 857
  • 1
  • 8
  • 10
  • 8
    I would edit "processed once" to be "processed at least once". In SQS, you aren't guaranteed that a message will be processed once. You'll get at least once processing: https://stackoverflow.com/questions/37472129/using-many-consumers-in-sqs-queue – skeller88 Nov 08 '17 at 20:41
  • 1
    Would both KCL instances get the same data or would e.g. the data be "round robined" across the EC2 instances? I'm looking for a solution where the consumer of the Kinesis Data Stream is constantly running without having to wait for another server to startup if one goes down (e.g. by having two servers always running, but avoiding to process the records twice) – Bernie Lenz Feb 21 '18 at 19:06
  • 14
    Each consumer gets the same data (managed by its checkpointing), and can consume it at whatever rate they want, independent of each other, similar to having two iterators. They're only coupled by their combined read limit. This is where Kinesis behaves differently to AWS SQS (which effectively has a single iterator). – Cameron Stone Feb 22 '18 at 22:49
  • 1
    SQS offers exactly-once processing with FIFO queues as of 2016: https://aws.amazon.com/about-aws/whats-new/2016/11/amazon-sqs-introduces-fifo-queues-with-exactly-once-processing-and-lower-prices-for-standard-queues/ – Sinan Erdem Sep 30 '22 at 06:52
  • When you say another KCL instance, I assume you meant that it creates another lease DDB table, so this second application consumes the same stream and shards independently, not affected by the lease of the first application. – Eric Xin Zhang May 06 '23 at 09:13
  • But the doc says: https://docs.aws.amazon.com/streams/latest/dev/shared-throughput-kcl-consumers.html#shared-throughput-kcl-consumers-concepts **But if you have two consumer application instances: A and B with worker A and worker B, and these instances are processing a data stream with 4 shards, worker A and worker B cannot both hold the lease to shard 1 at the same time. One worker holds the lease to a particular shard until it is ready to stop processing this shard’s data records or until it fails. When one worker stops holding the lease, another worker takes up and holds the lease.** – Neetu Das Aug 18 '23 at 08:17
0

to keep it simple, you can have multiple/different lambda functions get triggered on kinesis data. this way your both the lambdas are going to get all the data from the kinesis. The downside is that now you will have to increase the throughput at the kinesis level which is going to pricey. Use SQS instead for your use case.

Ankur Kothari
  • 822
  • 9
  • 11