1

I'm using Apache Kafka. I dump huge dbs into Kafka, where each database's table is a topic.

I cannot delete topic before it's completely consumed. I cannot set time-based retention policy because I don't know when topic will be consumed. I have limitited disk and too much data. I have to write code that will orchestrate by consumption and deletion programmatically. I understand that the problem appear because we're using Kafka for batch processing, but I can't change technology stack.

What is the correct way to delete consumed topic from brokers?

Currently, I'm calling kafka.admin.AdminUtils#deleteTopic. But I can't find clear related documentation. The method signature doesn't contain kafka server URLs. Does that mean that I'm deleting only topic's metadata and broker's disk usage isn't reduced? So when real append-log file deletion happens?

VB_
  • 45,112
  • 42
  • 145
  • 293

1 Answers1

1

Instead of using a time-based retention policy, are you able to use a size-based policy? log.retention.bytes is a per-partition setting that might help you out here.

I'm not sure how you'd want to determine that a topic is fully consumed, but calling deleteTopic against the topic initially marks it for deletion. As soon as there are no consumers/producers connected to the cluster and accessing those topics, and if delete.topic.enable is set to true in your server.properties file, the controller will then delete the topic from the cluster as soon as it is able to do so. This includes purging the data from disk. It can take anywhere between a few seconds and several minutes to do this.

Simon Clark
  • 624
  • 3
  • 11
  • 1
    thank you for your answer! I can't use size-based retention policy since I don't have enough acknowledgement of data. Few minutes is okey for deletion delay. I can precisely know when my data are consumed since I write consumers) Some data are consumed only once. – VB_ Aug 14 '18 at 15:28
  • 1
    could you please provide reference to documentation. I will accept your answer – VB_ Aug 14 '18 at 15:29
  • 2
    Weirdly, I can't find much at all in the Apache Kafka documentation as far as topic deletion goes. The `delete.topic.enable` switch is detailed under the [broker config section](https://kafka.apache.org/documentation/#brokerconfigs), and there's a [brief add/remove topic section](https://kafka.apache.org/documentation/#basic_ops_add_topic) that doesn't really say a lot. I've based my answer off of personal experience with Kafka, and what I've found [here](https://stackoverflow.com/questions/16284399/purge-kafka-topic) which suggests that `--delete` does in fact purge the data on disk. – Simon Clark Aug 14 '18 at 15:43