For reasons unknown and often several times a week in Production and Test, we cannot communicate with a Kafka broker, and this message repeats in the log: WARN Connection to node nnnn could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
Strangely this in turn prevents Kafka working (We cannot Produce/Consume).
OpenShift doesn't recognize it's not working, and Kafka doesn't recognize it either.
I am about to add a Livenessprobe to the YAML to restart the Pod if the command in a Broker container is not executed, but we'd like to find the root cause naturally.
If I use the Curl url:hostport command from another Broker or Zookeeper node, you can get a reply from all other Brokers and Zookeepers. Yet Curl to the Kafka node that has "failed" returns "Could not resolve host ...", even though I can go into OpenShift and use the Terminal. I cannot find any other errors in the logs.
I don't know if this is a Kafka or OpenShift/Kubernetes issue.
If anyone else has had this and resolved it, I'd be grateful for some pointers.