1

I have a spark structured streaming application consuming from kafka, for this application I would like to monitor the consumer lag. I 'm using below command to check consumer lag. However I don't get the CURRENT-OFFSET and hence LAG is blank too. Is this expected ? It works for other python based consumers.

Command

kafka-consumer-groups --bootstrap-server <bootstrap-server>:<port> --describe --all-groups

Output

GROUP                                                                TOPIC         PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG              CONSUMER-ID                                     HOST            CLIENT-ID
spark-kafka-source-b5e8d872-f727-4ed0-a82c-a3d279647942-407459747-driver-0 my_topic 21         -               5546            -               consumer-3-bc651181-fc62-4b1a-abdf-fb3e9d244df8 /<ip-address>    consumer-3
spark-kafka-source-b5e8d872-f727-4ed0-a82c-a3d279647942-407459747-driver-0 my_topic 7          -               5129            -               consumer-3-bc651181-fc62-4b1a-abdf-fb3e9d244df8 /<ip-address>    consumer-3
spark-kafka-source-b5e8d872-f727-4ed0-a82c-a3d279647942-407459747-driver-0 my_topic 3          -               5178            -               consumer-3-bc651181-fc62-4b1a-abdf-fb3e9d244df8 /<ip-address>    consumer-3
spark-kafka-source-b5e8d872-f727-4ed0-a82c-a3d279647942-407459747-driver-0 my_topic 9          -               4969            -               consumer-3-bc651181-fc62-4b1a-abdf-fb3e9d244df8 /<ip-address>    consumer-3
spark-kafka-source-b5e8d872-f727-4ed0-a82c-a3d279647942-407459747-driver-0 my_topic 2          -               5443            -               consumer-3-bc651181-fc62-4b1a-abdf-fb3e9d244df8 /<ip-address>    consumer-3
spark-kafka-source-b5e8d872-f727-4ed0-a82c-a3d279647942-407459747-driver-0 my_topic 15         -               5312            -               consumer-3-bc651181-fc62-4b1a-abdf-fb3e9d244df8 /<ip-address>    consumer-3
Michael Heil
  • 16,250
  • 3
  • 42
  • 77
fuubarbaz
  • 13
  • 4
  • I'm maintaining the project (https://github.com/HeartSaVioR/spark-sql-kafka-offset-committer) which does the commit with "custom" consumer group ID. (So that you can commit the offset to the group you want instead of uniquely generated one.) As the below answer explains, Spark doesn't, and you shouldn't try to modify the behavior Spark fully handles the offset information and doesn't give it to Kafka. – Jungtaek Lim Jan 23 '21 at 02:38

1 Answers1

1

"However I don't get the CURRENT-OFFSET and hence LAG is blank too. Is this expected?"

Yes, this is the expected behavior as Spark Structured Streaming applications are not committing any offsets back to Kafka. Therefore, the current offset and the lag of this consumer group will not be stored in Kafka and you will see exactly the result of the consumer-groups tool what you have shown.

I have written a more comprehensive answer on Consumer Group and how Spark Structured Streaming applications manage Kafka offsets here.

Michael Heil
  • 16,250
  • 3
  • 42
  • 77