0

I am testing and debugging an event-sourcing (or stateful stream processing) application that runs in top of kafka and uses samza. I want to remove queues and topics in kafka so that samza jobs get an empty kafka installation at startup.

How can I do it??


Edit:

The question is quite more complex and specific that what I wrote at first.

As David states there is a solution to purge the topic, starting from kafka 0.8.2: Purge Kafka Queue

What I am interested in is in setting up a testing environment that automatically loads zookeeper and kafka (which are bundled in my git repository as binary packages).

I am using gradle environment + eclipse and JUnit. I run integration tests from eclipse (as JUnit tests).

How could the loading be automated? Should I create a specific Test class to set up the environment and launch kafka and zookeeper? Is there any reference example/code? The idea would be to load the enviroment, run a few tests, and stop. If this process could be made in a few seconds, better.

Community
  • 1
  • 1
user2108278
  • 391
  • 5
  • 17
  • This is a duplicate question, asked and answered before. See here: http://stackoverflow.com/questions/16284399/purge-kafka-queue – David Griffin Mar 17 '16 at 18:45
  • As I think about it, this might be a slightly larger question. I'm going to answer the larger question and refer back to the specific approaches. – David Griffin Mar 17 '16 at 19:46

1 Answers1

1

There are different approaches to purging individual topics. All of them could be extended out to purging all of your topics. However, I think you are asking a larger question related to creating a baseline environment for Kafka -- something you would need for testing, perhaps. Or maybe you have a production process that starts from scratch each time. These are actually different scenarios.

For Testing

If you were talking testing, then I would do something brute force, on purpose. First, I would configure Kafka to be the way I would want it to look on startup. Then I would shut it down and back it up -- either using tar or possibly even making a disk image, if I am using a VM.

I'd use it and abuse it during testing, then throw it all away when I was done. "Resetting kafka" would just mean restoring either via untar or from a disk image or whatever (rsync even, or just cp from another directory).

For testing, I really do want a clean beginning, so I prefer brute force.

During Production

If this is part of your production processes -- and I question the wisdom of that on its face -- then I would try and not lose the data first. Either include a backup in your process, or don't actually reset the topics.

Topic rename doesn't exist yet -- but you can use the same approach that's going to be used. Don't deal directly with topic names -- have a dictionary to map virtual topic names to actual topic names.

Then, instead of "resetting" Kafka each time, create new versions of all of the topics, and update the dictionary to map the virtual topic name to the newly created topic versions.

Community
  • 1
  • 1
David Griffin
  • 13,677
  • 5
  • 47
  • 65
  • It is for testing. There already was in this related question in: http://stackoverflow.com/a/30833940/2108278 – user2108278 Mar 21 '16 at 11:27
  • Yes I referenced that in my answer -- but this is slightly more high level. That article handled individual topic purging, I was taking this question at a higher level -- how to reset all of Kafka for testing, not just an individual topic. – David Griffin Mar 21 '16 at 11:29