0

I have a kafka cluster of 5 nodes running on Kubernetes. Everything was working perfectly but after a cluster restart the java stream application is no longer able to connect to the cluster and it throws the following error

 java.io.IOException: Can't resolve address: kafka03.kafka:49092
at org.apache.kafka.common.network.Selector.doConnect(Selector.java:235)
at org.apache.kafka.common.network.Selector.connect(Selector.java:214)
at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:864)
at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:265)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.trySend(ConsumerNetworkClient.java:485)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:261)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:242)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:218)
at org.apache.kafka.clients.consumer.internals.Fetcher.fetchOffsetsByTimes(Fetcher.java:414)
at org.apache.kafka.clients.consumer.internals.Fetcher.beginningOrEndOffset(Fetcher.java:462)
at org.apache.kafka.clients.consumer.internals.Fetcher.endOffsets(Fetcher.java:452)
at org.apache.kafka.clients.consumer.KafkaConsumer.endOffsets(KafkaConsumer.java:2041)
at org.apache.kafka.clients.consumer.KafkaConsumer.endOffsets(KafkaConsumer.java:2013)
at org.apache.kafka.streams.processor.internals.StoreChangelogReader.initialize(StoreChangelogReader.java:134)
at org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:74)
at org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:317)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:824)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736)
Caused by: java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:101)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:622)
at org.apache.kafka.common.network.Selector.doConnect(Selector.java:233)
... 18 more

What I don't stand is that if I enter into the pod shell, it is able to ping the given host. Do you have any clue? I'm thinking of Java nio hostname resolution has some different approach to what the system ping does... but I cannot get the root of the problem (which may be k8s dns, however all test pass successfully)

Thanks

UPDATE: this is what comes out from coredns logs:

2019-12-16T13:11:43.970Z [INFO] 10.42.18.7:52692 - 27004 "A IN kafka03.kafka.kafka.svc.cluster.local udp 31 false 512" NXDOMAIN qr,rd,ra 106 0.001590748s
2019-12-16T13:11:43.972Z [INFO] 10.42.18.7:35926 - 4474 "A IN kafka05.kafka.kafka.svc.cluster.local. udp 55 false 512" NXDOMAIN qr,aa,rd 148 0.000447703s
MarcoAbi
  • 61
  • 1
  • 6
  • 1
    Look like problem with `advertised.listeners` param. What have you set there? Try to google for this param and kubernetes. – Arkadiusz Łukasiewicz Dec 16 '19 at 15:26
  • Ping doesn't do port checking – OneCricketeer Dec 17 '19 at 01:31
  • I would assume that the IP changed, but your application does buffer the resolved IP in the JVM DNS cache. Hence, restarting the Kafka Streams application should resolve the issue. You might alto want to disable DNS caching (or at least reduce the cache retention time): https://stackoverflow.com/questions/1256556/how-to-make-java-honor-the-dns-caching-timeout – Matthias J. Sax Dec 26 '19 at 19:03

0 Answers0