8

1) We have 3 node kafka & kafka connect cluster

2) We are running kafka-connect on kafka nodes only in distributed mode

3) When i am trying to create a connector using below configuration :

    {
      "name": "connector-state-0",
      "config": {
        "connector.class": "io.debezium.connector.mysql.MySqlConnector",
        "database.user": "user",
        "database.server.id": "5023",
        "database.hostname": "hostname",
        "database.password": "password",
        "database.history.kafka.bootstrap.servers": "ip:9092",
        "database.history.kafka.topic": "topicname",
        "database.server.name": "prod",
        "database.port": "3306",
        "snapshot.mode": "when_needed",
        "include.schema.changes": "false",
        "table.whitelist": "country.state"
    }
   }

On the request to create a connector it is giving me below error on 2 of 3 nodes :

{"error_code":409,"message":"Cannot complete request because of a conflicting operation (e.g. worker rebalance)"}

On one of the node : I am able to create a connector but task didn't started and i can see below error in logs :

[2019-01-23 10:50:06,455] INFO 127.0.0.1 - - [23/Jan/2019:10:50:06 +0000] "POST /connectors/birdeye-connector-state-0/tasks?forward=true HTTP/1.1" 409 113  8 (org.apache.kafka.connect.runtime.rest.RestServer:60)
[2019-01-23 10:50:06,462] INFO 127.0.0.1 - - [23/Jan/2019:10:50:06 +0000] "POST /connectors/birdeye-connector-state-0/tasks HTTP/1.1" 409 113  21 (org.apache.kafka.connect.runtime.rest.RestServer:60)
[2019-01-23 10:50:06,466] ERROR Request to leader to reconfigure connector tasks failed (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1020)
org.apache.kafka.connect.runtime.rest.errors.ConnectRestException: Cannot complete request because of a conflicting operation (e.g. worker rebalance)
    at org.apache.kafka.connect.runtime.rest.RestClient.httpRequest(RestClient.java:97)
    at org.apache.kafka.connect.runtime.distributed.DistributedHerder$18.run(DistributedHerder.java:1017)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

I am not able to figure out what is causing the isssue.

Sahil Gupta
  • 121
  • 1
  • 7
  • 1
    Note that running Kafka Connect on the same nodes as Kafka brokers is not recommended. – Robin Moffatt Jan 23 '19 at 11:01
  • When you successfully run it on the one node, and see that error in the log, was anything else happening at the same time? e.g. task rebalance? – Robin Moffatt Jan 23 '19 at 11:02
  • @RobinMoffatt: No ... What could be the possible reasons for the same ? – Sahil Gupta Jan 23 '19 at 11:13
  • @RobinMoffatt: I can see below logs very frequently on the node on which error is coming : Added READ_UNCOMMITTED fetch request for partition connect-configs-0 at offset 233 to node prod-paid-kafka-node-api-1.birdeye.com:9092 (id: 0 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:843) – Sahil Gupta Jan 23 '19 at 11:27
  • Can you check if it is actually starting your connector or not(irrespective of error message)? Try creating connector from leader worker. Also make sure that port describe in "rest.advertised.port" from your worker.config is not used across by any other process across all nodes. – suraj_fale Jan 23 '19 at 17:21
  • @SRJ: My connector is starting only on one of the nodes but the worker is not starting .... Will try the thing you said for "rest.advertised.port" ... thanks – Sahil Gupta Jan 25 '19 at 06:33
  • @RobinMoffatt: could you please explain why it is not recommended to run KConnect and KBroker on the same host? How about KBroker & KRestProxy and KConnect & KRestProxy? Thanks. – Averell Aug 02 '19 at 11:36
  • Because of resource contention and component-specific sizing and tuning. You _can_ run them all on the same host, but it's best not to. – Robin Moffatt Aug 02 '19 at 13:13

1 Answers1

6

You need to set rest.advertised.host.name to the host or IP that the other Kafka Connect workers can resolve and connect to. This is because it is used for the internal communication between workers.

If your REST request hits a worker that is not the current leader of the cluster, that worker will try to forward the request to the leader. It does this using the rest.advertised.host.name. But if rest.advertised.host.name is localhost then the worker will simply be forwarding the request to itself and hence things won't work. Of your three workers one will be the leader, which is why you've found that this fails for two out of three.

For more details see https://rmoff.net/2019/11/22/common-mistakes-made-when-configuring-multiple-kafka-connect-workers/

Robin Moffatt
  • 30,382
  • 3
  • 65
  • 92