2

I'm recently using nestJS and kafkaJs for my microservice project and after each server refresh (debug mode) the KafkaJS take too long to connect to the kafka cluster again, is there some configuration missing?

My kafkaJS version is 2.2.3 and I have 3 brokers in my cluster and my cluster is configured on my company servers, so there is no network latency and this is my kafkaJS configuration


    client: {
      username,
      password,
      clientId,
      brokers: [`${host}:${port}`],
      authenticationTimeout: 10000,
      reauthenticationThreshold: 5,
    },

I guess there might be something wrong with Kafka group rebalancing, but i don't know if i configured something wrong.

And this is my Kafka docker-compose


services:
  zookeeper:
    image: 'zookeeper:3.6.2'
    container_name: kiz-zookeeper
    ports:
      - '${ZOOKEEPER_PORT}:2181'
    volumes:
      - 'zookeeper-data:/data'
      - 'zookeeper-txn-logs:/txn-logs'
      - 'zookeeper-log:/datalog'

  kafka1:
    image: 'bitnami/kafka:latest'
    container_name: kiz-kafka-cluster-1
    environment:
      KAFKA_CFG_ZOOKEEPER_CONNECT: zookeeper:${ZOOKEEPER_PORT}
      ALLOW_PLAINTEXT_LISTENER: "yes"
      KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT
      KAFKA_CFG_LISTENERS: INTERNAL://:9092,EXTERNAL://:${KAFKA_1_PORT}
      KAFKA_CFG_ADVERTISED_LISTENERS: INTERNAL://kafka1:9092,EXTERNAL://172.16.100.211:${KAFKA_1_PORT}
      KAFKA_INTER_BROKER_LISTENER_NAME: INTERNAL
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_HEAP_OPT: =-Xmx${KAFKA_RAM}m
    ports:
      - '${KAFKA_1_PORT}:9093'
    depends_on:
      - zookeeper
    volumes:
      - 'kafka1-data:/bitnami/kafka/data'

  kafka2:
    image: bitnami/kafka:latest
    environment:
      KAFKA_CFG_ZOOKEEPER_CONNECT: zookeeper:${ZOOKEEPER_PORT}
      ALLOW_PLAINTEXT_LISTENER: "yes"
      KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT
      KAFKA_CFG_LISTENERS: INTERNAL://:9092,EXTERNAL://:${KAFKA_2_PORT}
      KAFKA_CFG_ADVERTISED_LISTENERS: INTERNAL://kafka2:9092,EXTERNAL://172.16.100.211:${KAFKA_2_PORT}
      KAFKA_INTER_BROKER_LISTENER_NAME: INTERNAL
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_HEAP_OPT: =-Xmx${KAFKA_RAM}m
    ports:
      - '${KAFKA_2_PORT}:9095'
    depends_on:
      - zookeeper
    volumes:
      - 'kafka2-data:/bitnami/kafka/data'

  kafka3:
    image: 'bitnami/kafka:latest'
    container_name: kiz-kafka-cluster-3
    environment:
      KAFKA_CFG_ZOOKEEPER_CONNECT: zookeeper:${ZOOKEEPER_PORT}
      ALLOW_PLAINTEXT_LISTENER: "yes"
      KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT
      KAFKA_CFG_LISTENERS: INTERNAL://:9092,EXTERNAL://:${KAFKA_3_PORT}
      KAFKA_CFG_ADVERTISED_LISTENERS: INTERNAL://kafka3:9092,EXTERNAL://172.16.100.211:${KAFKA_3_PORT}
      KAFKA_INTER_BROKER_LISTENER_NAME: INTERNAL
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_HEAP_OPTS: -Xmx${KAFKA_RAM}m
    ports:
      - '${KAFKA_3_PORT}:9097'
    depends_on:
      - zookeeper
    volumes:
      - 'kafka3-data:/bitnami/kafka/data'

mmRoshani
  • 99
  • 7
  • 1
    Can you be more specific. What are we to take of the statement `take to long to connect`? minutes? hours? days? What is your expectation for the connection time. Do other frameworks or clients connect faster? Are you refreshing the client server or the kafka broker server? – Chris Doyle Apr 16 '23 at 09:51
  • 1) You shouldn't run 3 brokers in the same machine with compose 2) What do you mean "company servers" when you've shown a compose file, which should be running locally? Either way, this doesn't decide latency 3) what are host and port in your JS code? Have you tried following this? https://stackoverflow.com/questions/51630260/connect-to-kafka-running-in-docker – OneCricketeer Apr 16 '23 at 13:35
  • @ChrisDoyle It takes about two minutes. I did not test with other frameworks and by `client server` if you mean my nestJS server I rapidly refreshed but I am not refreshing my Kafka brokers. – mmRoshani Apr 17 '23 at 06:30
  • @OneCricketeer there are servers in the building that I can see, these servers are being used only for development. I don't understand why shouldn't I run 3 Kafka brokers on one server for development purposes please explain and keep in mind I develop my backend (with nestJS) on my local machine in the local network. the port is `9092`. – mmRoshani Apr 17 '23 at 06:38
  • 3 brokers are sharing all one cpu and one disk. It's still a single point of failure... So replication is pointless. Your code will still work with one broker. Your real Kafka cluster can use 3 or more brokers, but it shouldn't be ran from Docker compose. Regarding the networking, [have you read this completely](https://stackoverflow.com/questions/51630260/connect-to-kafka-running-in-docker)? Can you test and reproduce the issue outside of Javascript code using native Kafka tools? – OneCricketeer Apr 17 '23 at 12:10
  • @OneCricketeer take a look at the answer I posted if you facing the same problem hope it helps you. – mmRoshani May 15 '23 at 20:07
  • @ChrisDoyle take a look at the answer I posted if you facing the same problem hope it helps you. – mmRoshani May 15 '23 at 20:07

1 Answers1

1

After a while, I finally found out issue issue on kafka.js GitHub page.

Here is the Nevon answer that I will quote:

What you need to keep in mind is that the consumer group exists outside of your node instances. Whenever a consumer joins or leaves the group, all the members of the group have to re-join and sync before they can start consuming again.

When you start your node process, your consumer will join that consumer group, and when you quit the process (with graceful disconnect), your consumer will leave the consumer group. Like mentioned above, this means the group has to rebalance (re-join and sync). When you just exit the process without disconnecting, your consumer doesn't actually leave the group - so initially there won't be a rebalance. However, after sessionTimeout that consumer will be considered unhealthy and will be kicked out of the group, which will trigger a rebalance. In the in-between time, no processing will happen on any partitions that are assigned to that consumer. If you are developing with nodemon, causing frequent restarts, a solution is probably to generate a random groupid on each restart. When static membership (#884) becomes available, that could be another option, but it's not available yet so for now a random group id is probably the best bet. Either that, or you have to wait for the rebalance to happen.

Hope that helps and best regards.

mmRoshani
  • 99
  • 7