Kafka Topology Best Practice

Question

I have 4 machines where a Kafka Cluster is configured with topology that each machine has one zookeeper and two broker.

With this configuration what do you advice for maximum topic&partition for best performance?

Replication Factor 3: using kafka 0.10.XX

Thanks?

Just curious, what was the rationale behind going with two brokers per machine? I've only ever seen production machines set up with a single broker+zookeeper instance. — Mansoor Siddiqui, Aug 10 '17 at 22:36
Actually, I have invented this configuration, so I am investigating the best topology. — Ahmet Karakaya, Aug 11 '17 at 05:07
the best topology is to have a single Kafka broker on each machine. It is not a good idea to run multiple brokers on the same machine in production. — Hans Jespersen, Aug 11 '17 at 06:41
@HansJespersen Thanks, I will take your advice into account while designing new topology. First of all I would like to know why running multiple broker instances in one machine is not goog idea. — Ahmet Karakaya, Aug 11 '17 at 08:20
There is a great reference architecture paper that explains a lot of the reasons here https://www.confluent.io/whitepaper/confluent-enterprise-reference-architecture/. If you set replication factor of 2 you would want to survive a node failure but if you put two brokers on the same machine and they replicate data to one another then when the machine dies you will have lost the primary data and the backup copy of the data at the same time. Also these two broker can interfere which each other when running on the same machine by competing for disk IO and kernel resources like page cache. — Hans Jespersen, Aug 11 '17 at 14:44
@HansJespersen I think replication could not be problem when one machine dies when using replication factor like mine, but I agree with that disk IO could ne bottleneck in case of failre — Ahmet Karakaya, Aug 11 '17 at 18:30
If you have replication factor of 3 you expect to be able to survive 2 failures and in your config if the host with two brokers dies, you cannot survive a second host failure nor could you do a rolling restart without causing a service disruption. — Hans Jespersen, Aug 11 '17 at 19:20
@HansJespersen With rack-aware feature, Kafka can able to store it's replicated topic-partition in another machine. And, dedicated disk partition can be allotted to write data for a Kafka instance with-in a machine. Apart, from that do you have any other concerns ? — Kamal Chandraprakash, Aug 12 '17 at 07:53
@KamalC the other concerns are that the Linux page cache will be not as effective with three processes sharing the machine's memory and also that should the box fail, the cluster will be failing over 2 brokers and 1 zookeeper all at the same time rather than just a single broker. It also adds administrative complexity because the two brokers on the same box need unique ports. You are better off buying more smaller servers and running one broker on each rather than paying a premium for big servers and then trying to run lots of brokers on the same machine, or virtual machines on the same host. — Hans Jespersen, Aug 12 '17 at 18:20

PragmaticProgrammer · Accepted Answer · 2019-03-23T20:18:13.813

Each topic is restricted to 100,000 partitions no matter how many nodes (as of July 2017)

As to the number of topics that depends on how large the smallest RAM is across the machines. This is due to Zookeeper keeping everything in memory for quick access (also it doesnt shard the znodes, just replicates across ZK nodes upon write). This effectively means once you exhaust one machines memory that ZK will fail to add more topics. You will most likely run out of file handles before reaching this limit on the Kafka broker nodes.

To quote the KAFKA docs on their site (6.1 Basic Kafka Operations https://kafka.apache.org/documentation/#basic_ops_add_topic):

Each sharded partition log is placed into its own folder under the Kafka log directory. The name of such folders consists of the topic name, appended by a dash (-) and the partition id. Since a typical folder name can not be over 255 characters long, there will be a limitation on the length of topic names. We assume the number of partitions will not ever be above 100,000. Therefore, topic names cannot be longer than 249 characters. This leaves just enough room in the folder name for a dash and a potentially 5 digit long partition id.

To quote the Zookeeper docs (https://zookeeper.apache.org/doc/trunk/zookeeperOver.html):

The replicated database is an in-memory database containing the entire data tree. Updates are logged to disk for recoverability, and writes are serialized to disk before they are applied to the in-memory database.

Performance:

Depending on your publishing and consumption semantics the topic-partition finity will change. The following are a set of questions you should ask yourself to gain insight into a potential solution (your question is very open ended):

Is the data I am publishing mission critical (i.e. cannot lose it, must be sure I published it, must have exactly once consumption)?
Should I make the producer.send() call as synchronous as possible or continue to use the asynchronous method with batching (do I trade-off publishing guarantees for speed)?
Are the messages I am publishing dependent on one another? Does message A have to be consumed before message B (implies A published before B)?
How do I choose which partition to send my message to? Should I: assign the message to a partition (extra producer logic), let the cluster decide in a round robin fashion, or assign a key which will hash to one of the partitions for the topic (need to come up with an evenly distributed hash to get good load balancing across partitions)
How many topics should you have? How is this connected to the semantics of your data? Will auto-creating topics for many distinct logical data domains be efficient (think of the effect on Zookeeper and administrative pain to delete stale topics)?
Partitions provide parallelism (more consumers possible) and possibly increased positive load balancing effects (if producer publishes correctly). Would you want to assign parts of your problem domain elements to specific partitions (when publishing send data for client A to partition 1)? What side-effects does this have (think refactorability and maintainability)?
Will you want to make more partitions than you need so you can scale up if needed with more brokers/consumers? How realistic is automatic scaling of a KAFKA cluster given your expertise? Will this be done manually? Is manual scaling viable for your problem domain (are you building KAFKA around a fixed system with well known characteristics or are you required to be able to handle severe spikes in messages)?
How will my consumers subscribe to topics? Will they use pre-configured configurations or use a regex to consume many topics? Are the messages between topics dependent or prioritized (need extra logic on consumer to implement priority)?
Should you use different network interfaces for replication between brokers (i.e. port 9092 for producers/consumers and 9093 for replication traffic)?

Good Links:

http://cloudurable.com/ppt/4-kafka-detailed-architecture.pdf https://www.slideshare.net/ToddPalino/putting-kafka-into-overdrive https://www.slideshare.net/JiangjieQin/no-data-loss-pipeline-with-apache-kafka-49753844 https://kafka.apache.org/documentation/

Kafka Topology Best Practice

1 Answers1

Linked