1

My Kafka clients are running in GCP App Engine Flex environment with auto scale enabled (GCP keeps the instance count to at least two and it has been mostly 2 due to low CPU usages). The consumer groups running in that 2 VMs have been consuming messages from various topics in 20 partitions for several months and recently I noticed that partitions in older topics shrank to just 1 (!) and offsets for that consumer group was reset to 0. topic-[partition] directories were also gone from the kafka-logs directory. Strangely, recently created topic partitions are intact. I have 3 different environments (all in GCP) and this happened to all three. We didn't see any lost messages or data problem but want to understand what had happened to avoid this happening again.

The kafka broker and zookeeper are running in the same and single GCP compute engine instance (I know it's not the best practice and have plan to improve) and I suspect it has something to do with machine restart and that wipes out some information. However, I verified that data files are written under /opt/bitnami/(kafka|bitnami) directory and not /tmp which can be removed by machine restarts.

  • Spring Kafka 1.1.3
  • kafka client 0.10.1.1
  • single node kafka broker 0.10.1.0
  • single node zookeeper 3.4.9

Any insights on this will be appreciated!

nanaboo
  • 365
  • 4
  • 14
  • What does this have to do with [tag:spring-kafka] ? – Gary Russell Dec 15 '17 at 21:09
  • I am using spring-kafka to both produce and consume messages. I just listed it hoping to get some feedback and for visibility. It if has absolutely nothing to do with the symptom described above, I should remove it. – nanaboo Dec 15 '17 at 21:31
  • I was able to reproduce this. By restarting broker VM, partition count comes down to '1' and offset starts with '0'. So it appears that Broker server configuration/start up process issue. {spring-[kafka}-client] is off the hook. Gary, sorry for the false alarm. Will delete the tag – nanaboo Dec 16 '17 at 00:43

1 Answers1

1

Bitnami developer here. I could reproduce the issue and track it down to an init script that was clearing the content of the tmp/kafka-logs/ folder.

We released a new revision of the kafka installers, virtual machines and cloud images fixing the issue. The revision that includes the fix is 1.0.0-2.

  • Thanks for the patch. However, there was more than that. It was also removing all the zookeeper data (!) on VM restarts. https://community.bitnami.com/t/kafka-clear-topics-after-reboot-vm/54069 – nanaboo Dec 26 '17 at 20:29