Splitting Kafka into separate topic or single topic/multiple partitions

Question

As usual, it's bit confusing to see benefits of splitting methods over others.

I can't see the difference/Pros-Cons between having
- Topic1 -> P0 and Topic 2 -> P0
- over Topic 1 -> P0, P1
  and a consumer pull from 2 topics or single topic/2 partitions, while P0 and P1 will hold different event types or entities.

Thee only benefit I can see if another consumer needs Topic 2 data then it's easy to consume

Regarding topic auto generation, any benefits behind that way or it will be out of hand after some time?

Thanks

Giorgos Myrianthous · Accepted Answer · 2019-08-28T15:20:32.170

I would say this decision depends on multiple factors;
- Logic/Separation of Concerns: You can decide whether to use multiple topics over multiple partitions based on the logic you are trying to implement. Normally, you need distinct topics for distinct entities. For example, say you want to stream users and companies. It doesn't make much sense to create a single topic with two partitions where the first partition holds users and the second one holds the companies. Also, having a single topic for multiple partitions won't allow you to implement e.g. message ordering for users that can only be achieved using keyed messages (message with the same key are placed in the same partition).
- Host storage capabilities: A partition must fit in the storage of the host machine while a topic can be distributed across the whole Kafka Cluster by partitioning it across multiple partitions. Kafka Docs can shed some more light on this:
  
  The partitions in the log serve several purposes. First, they allow the log to scale beyond a size that will fit on a single server. Each individual partition must fit on the servers that host it, but a topic may have many partitions so it can handle an arbitrary amount of data. Second they act as the unit of parallelism—more on that in a bit.
- Throughput: If you have high throughput, it makes more sense to create different topics per entity and split them into multiple partitions so that multiple consumers can join the consumer group. Don't forget that the level of parallelism in Kafka is defined by the number of partitions (and obviously active consumers).
- Retention Policy: Message retention in Kafka works on partition/segment level and you need to make sure that the partitioning you've made in conjunction with the desired retention policy you've picked will support your use case.
Coming to your second question now, I am not sure what is your requirement and how this question relates to the first one. When a producer attempts to write a message to a Kafka topic that does not exist, it will automatically create that topic when auto.create.topics.enable is set to true. Otherwise, the topic won't get created and your producer will fail.

auto.create.topics.enable: Enable auto creation of topic on the server

Again, this decision should be dependent on your requirements and the desired behaviour. Normally, auto.create.topics.enable should be set to false in production environments in order to mitigate any risks.

Regarding the point of concern seperation, let's say that a service would depend on `users` and `companies`, how to assure concurrency at this case if they are on 2 different topics? Also let's say that one broker that holds `users` topic went down — Ahmed Alaa El-Din, Sep 03 '19 at 09:14

score 3 · Answer 2 · answered Aug 28 '19 at 15:59

3

Just adding some things on top of Giorgos answer:

By choosing the second approach over the first one, you would lose a lot of features that Kafka offers. Some of the features may be: data balancing per brokers, removing topics, consumer groups, ACLs, joins with Kafka Streams, etc.
I think that this flag can be easily compared with automatically creating tables in your database. It's handy to do it in your dev environments but you never want it to happen in production.

answered Aug 28 '19 at 15:59

BogdanSucaciu

884
6
13

Thanks alot, but can you please explain more if choosing 2nd approach would lose consumer group its capability? – Ahmed Alaa El-Din Aug 29 '19 at 07:06
How does a consumer group work? Every consumer from the same group is assigned to the same topic but each consumer has to different partitions. Basically, two consumers from the same consumer group can not consume from the same partition. That's why there is a strict correlation between nb. of partitions and consumers in a consumer group. Nb.of partitions has to be >= nb. of consumers, otherwise, any extra consumer will remain idle. – BogdanSucaciu Aug 29 '19 at 07:12

Splitting Kafka into separate topic or single topic/multiple partitions

2 Answers2