95

Sorry if it is a newbie question. But I'm trying to understand what should I use. As far as I understand Kafka is :

Apache Kafka is a distributed publish-subscribe messaging system.

And SNS is also pub/sub system.

My goal is to use some queue messaging system on AWS with application that will be distributed over few servers (By the way the main language is Python). And because it is on amazon, my first thought was to use SNS and SQS. But then I saw a lot of people using Kafka on AWS. What are the advantages of one over another?

Vor
  • 33,215
  • 43
  • 135
  • 193
  • 1
    kafka has replication factor for replication. Why do you say that messages are not replicated in kafka?Check "default.replication.factor" at http://kafka.apache.org/08/configuration.html . – skhurana Dec 25 '13 at 17:19

3 Answers3

119

The use-cases for Kafka and Amazon SQS/Amazon SNS are quite different.

Kafka, as you wrote, is a distributed publish-subscribe system. It is designed for very high throughput, processing thousands of messages per second. Of course you need to setup and cluster it for yourself. It supports multiple readers, which may "catch up" with the stream of messages at any point (well, as long as the messages are still on disk). You can use it both as a queue (using consumer groups) and as a topic.

An important characteristic is that you cannot selectively acknowledge messages as "processed"; the only option is acknowledging all messages up to a certain offset.

SQS/SNS on the other hand:

  • no setup/no maintenance
  • either a queue (SQS) or a topic (SNS)
  • various limitations (on size, how long a message lives, etc)
  • limited throughput: you can do batch and concurrent requests, but still achieving high throughputs would be expensive
  • I'm not sure if the messages are replicated; however at-least-once guarantee delivery in SQS would suggest so
  • SNS has notifications for email, SMS, SQS, HTTP built-in. With Kafka, you would probably have to code it yourself
  • no "message stream" concept

So overall I would say SQS/SNS are well suited for simpler tasks and workloads with a lower volume of messages.

adamw
  • 8,038
  • 4
  • 28
  • 32
  • thank you for reply. I spent some time and red about kafka. I have to say that I like it a lot. But one part is still unclear. In kafka model producer-broker-consumer. Producer should be very sophisticated mechanism, that will use use some load balancer probably with round robin algorithm... Am I right? Because from my point of view one producer should load more into broker than "similar" consumer can read, right? – Vor May 09 '13 at 12:47
  • 2
    No, producers can be simple, just sending messages to Kafka. See "Automatic producer load balancing" in http://kafka.apache.org/07/design.html. – adamw May 12 '13 at 08:05
  • According to the Kafka documentation at http://kafka.apache.org/documentation.html, Kafka can be used as a traditional queue: "If all the consumer instances have the same consumer group, then this works just like a traditional queue balancing load over the consumers." – Kirby Mar 18 '15 at 21:44
  • Update/Correction: Kafka supports replication. Ref: http://kafka.apache.org/documentation.html#replication – amey91 Sep 18 '15 at 03:19
  • At this time, kafka most definitely supports replication and persistance. I think it makes this answer invalid, even though it once made sense. – nichochar Jun 09 '16 at 16:11
  • 1
    @nichochar of course, I edited the answer, it was written 3 years ago :) – adamw Jun 09 '16 at 19:10
  • 2
    SQS now advertises "unlimited throughput" with its standard queues. Throughput on the newer "FIFO Queues with Exactly Once Processing" is quite limited in comparison: https://aws.amazon.com/about-aws/whats-new/2016/11/amazon-sqs-introduces-fifo-queues-with-exactly-once-processing-and-lower-prices-for-standard-queues/ – sixty4bit Sep 02 '17 at 14:29
74

This is a classic trade-off:

AWS tools (SQS, SNS)

These will be easier for you to setup, and integrate with the rest of your architecture, especially if most of it is already running on AWS. It will also probably be cheaper at first, since they have a good pay as you go model, but the cost will not scale as well, so you have to think about that.

Apache Kafka

Here, you're using a highly popular (not trendy) distributed (this is important if you think you will scale a lot) PUB/SUB model. Nowadays, this model seems to be much preferred, since running analytics on the data going through the pipes is very common, and usually with an SOA architecture you can have a multitude of small services consuming the messages and doing their thing, without having the data be removed from the queue. You also get a lot of configuration options, so depending on your use case you can fine tune it to your needs. This means more work, but a more optimized service down the road.

Summary

This is a classic trade-off of speed of development and ease of development vs the best, very modular and personalized solution, that has more overhead for the first implementation but scales better.

Personal Advice

If you are prototyping something, favor speed of development, so AWS tools. If your requirements are frozen and require significant scale, definitely take the time to use kafka. I also am a big believer in using-open-source-makes-the-world-better, but that's not the biggest argument to use.

bflemi3
  • 6,698
  • 20
  • 88
  • 155
nichochar
  • 2,720
  • 1
  • 18
  • 16
4

points mentioned above are really helpful in addition to above

  1. Its super difficult to multi-tenant SQS/SNS perhaps there is now way until creating separate queue for each tenant (very hard to maintain)
  2. Kafka is clusterable, cluster connected to apps and db’s in real time and provide key / value access of data. Retention period for each message , distribution and replication are bigger advantage -- Where is SQS is more of a blackbox, sends a message and receiver, receives mark it processed and delete.
Pravin Bansal
  • 4,315
  • 1
  • 28
  • 19