2

I'm resiliency testing a kafka connector and I'd like to kill off a worker while it's running, thus killing the connector instance. The easiest way is probably going to be to force distributed mode to run over more than one node, then just kill the worker process on that node (right?). How can I make Kafka connect spawn workers on more than just the node it's started on? Is this something which is defined in worker config?

TheRealJimShady
  • 3,815
  • 4
  • 21
  • 40

2 Answers2

2

Yes, handling failures and automatically restarting workload is exactly what Kafka Connect can do. You run it as a cluster, typically one worker per node. Each worker then runs one or many tasks, and this is managed by Connect. If a worker dies, all the tasks that it was running are restarted on other available workers, in a load balanced manner. Check out the architecture reference for more information.

To define workers as being within a cluster, assign them the same group.id. See the config docs for more info.

Robin Moffatt
  • 30,382
  • 3
  • 65
  • 92
  • Thank you for answering. I know the expected behaviour of connect in distributed mode, I'm asking two things; how to ensure workers are distributed to different nodes, and how to kill one off. – TheRealJimShady Nov 08 '17 at 17:17
  • How you deploy _workers_ is up to you, it's part of Connect installation (just make sure you set the `group.id`). It's the _task_ that gets distributed automatically by Kafka Connect, and actually executes the work (pull from source / push to target). So depending on the failure scenario in question either a worker (with zero, one, or many tasks) dies, or a single task. Simulate a worker dying by killing the JVM process. To kill an individual task, I'm not sure if that's possible. – Robin Moffatt Nov 08 '17 at 17:41
  • sorry bumping old question but this isn't load balancing. Sharing workload (tasks) should be load balancing - this more failover. I tried achieve load balancing but with no success. Is It even possible? Any references? – Miki Sep 06 '19 at 14:27
  • I'd suggest you start a new question (referencing this one) with clear details of what it is you're trying to do, because it's not clear from your comment precisely what you're after. thanks. – Robin Moffatt Sep 06 '19 at 15:21
  • @RobinMoffatt - here it is https://stackoverflow.com/questions/57869583/kafka-connector-distributed-load-balancing-tasks – Miki Sep 10 '19 at 11:12
1

So in the end what I did was:

  • Copied all the jars I needed for Kafka Connect distributed mode to the two nodes I wanted to run it on (in HDP 2.5.3 you only get those jars on one node).
  • On both nodes, I ran the start script with properties file pointing to my jars.
  • Using the REST interface I posted my connector with a task, and I could see that one worker had the connector instance and another had its task.
  • I killed off the task worker node (using ps -ef | grep connect), and saw that it had respawned on the remaining node.
  • I reset the test and tried killing off the connector instance node, and to my amazement, the connector instance restarted on the other node.

In summary of my resiliency testing, Kafka Connect seems to be like playing whack-a-mole; you can kill off tasks or connectors wherever they are, and they will just respawn somewhere else.

TheRealJimShady
  • 3,815
  • 4
  • 21
  • 40