I'm using Apache Kafka. I dump huge dbs into Kafka, where each database's table is a topic.
I cannot delete topic before it's completely consumed. I cannot set time-based retention policy because I don't know when topic will be consumed. I have limitited disk and too much data. I have to write code that will orchestrate by consumption and deletion programmatically. I understand that the problem appear because we're using Kafka for batch processing, but I can't change technology stack.
What is the correct way to delete consumed topic from brokers?
Currently, I'm calling kafka.admin.AdminUtils#deleteTopic
. But I can't find clear related documentation. The method signature doesn't contain kafka server URLs. Does that mean that I'm deleting only topic's metadata and broker's disk usage isn't reduced? So when real append-log file deletion happens?