16

currently I am evaluating different Messaging Systems. There is a question related to Apache Kafka which I could not answer myself.

Is it possible for a Kafka producer to create topics and partitions (on existing topics aswell) dynamically? If yes, is there any disadvantage that comes with it?

Thanks in Advance

smartwepa
  • 311
  • 1
  • 3
  • 11

4 Answers4

25

Updated:

The kafka broker has a property: auto.create.topics.enable

If you set that to true if the producer publishes a message to the topic with the new topic name it will automatically create a topic for you.

The Confluent Team recommends not doing this because the explosion of topics, depending on your environment can become unwieldy, and the topic creation will always have the same defaults when created. It's important to have a replication-factor of at least 3 to ensure durability of your topics in the event of disk failure.

user2122031
  • 581
  • 5
  • 11
  • Thanks, I my case I want to have a topic/partition for each device (producer). I do not know how many devices there will be, so I want to add them dynamically. The above solution sounds a bit “sluggish“. I gues a classic Pub/Sub system might work better. – smartwepa Apr 24 '17 at 08:03
6

When you are starting your kafka broker you can define a bunch of properties in conf/server.properties file. One of the property is auto.create.topics.enable if you set this to true (by default) kafka will automatically create a topic when you send a message to a non existing topic. The partition number will be defined by the default settings in this same file.

Disadvantages : as far as I know, topics created this way will always have the same default settings (partitions, replicas ...).

ImbaBalboa
  • 851
  • 9
  • 23
  • Thus, in fact beacuse of the downside of having the same partition number for ALL the topics this is not a viable solution – andreagalle Nov 04 '22 at 13:58
3

From java you can create a topic, if needed. Whether it's recommended or not, depends on the use-case. E.g. if your topic name is a function of the incoming payload to the producer, it might be useful. Following is the code snippet that works in kafka 0.10.x

void createTopic(String zookeeperConnect, String topicName) throws InterruptedException {
    int sessionTimeoutMs = <some-int-value>;
    int connectionTimeoutMs = <some-int-value>;

    ZkClient zkClient = new ZkClient(zookeeperConnect, sessionTimeoutMs, connectionTimeoutMs, ZKStringSerializer$.MODULE$);

    boolean isSecureKafkaCluster = false;
    ZkUtils zkUtils = new ZkUtils(zkClient, new  ZkConnection(zookeeperConnect), isSecureKafkaCluster);

    Properties topicConfig = new Properties();
    try {
      AdminUtils.createTopic(zkUtils, topicName, 1, 1, topicConfig,
      RackAwareMode.Disabled$.MODULE$);
    } catch (TopicExistsException ex) {
    //log it 
    }
    zkClient.close();
}

Note: It's only allowed to increase no. of partitions.

Bitswazsky
  • 4,242
  • 3
  • 29
  • 58
  • we use a similar approach to create topics on the fly. What about partitions? – user2105282 May 25 '18 at 08:13
  • @user2105282 The `AdminUtils.createTopic()` method takes both the number of partitions and replications as argument. So, you can choose those accordingly. – Bitswazsky Jun 23 '18 at 08:21
1

For any messaging system, i don't think it is recommended way to create topic/partition or any queue dynamically by producer.

For you use case, you can probably use device_id as your as partition key to distinguish the messages.That way you can use one topic.

Girdhar Sojitra
  • 648
  • 4
  • 14
  • I thought about that. The problem is, I do not know all devices/device-ids. Or in other words, I want to add devices that publish data dynamically. – smartwepa Apr 26 '17 at 05:43
  • I don't think you need to worry about anticipating the keys (i.e. the devices). Kafka by default will assign partitions randomly. If you want to separate by device (i.e. key) you can create a stream that filters on the key name. – F. P. Freely Jun 08 '18 at 20:40
  • @Girdhar if using device_id, the consumer has to first read all messaes in the topic and then filter out by device_id to get relevant data, is it? – Gadam Jul 28 '21 at 15:37