1

What is the best way to check how many events are consumed by an application from a Kafka topic during a time window?

Currently I am doing this:

$ ./bin/kafka-console-consumer.sh --zookeeper zookeeper:2181 --topic topic --from-beginning | grep -i '2018-05-29' > kafka.out
$ wc -l kafka.out

Some issues:

  1. It requires a timestamp of some sort to be in the payload (this is okay in this case)
  2. It starts from the first offset so you need to wait some time for the file to populate, depending on the throughput of your topic

Any better ways to do this? Preferably command line as it is used for ad-hoc analysis.

I am using Kafka 0.10 but any answers for newer versions would also be good to know.

Thanks

bp2010
  • 2,342
  • 17
  • 34
  • 2
    In Kafka you have producers and consumers. Counting the number of "processed" events does not make sense. You can either count the number of messages produced by an application INSIDE that application, or the number of messages consumed by another application. – Harold May 30 '18 at 11:20
  • 1
    Also a Kafka message may be any format and may contain no date at all. So there's no built-in tool to achieve what you expect. – Harold May 30 '18 at 11:27
  • Yes you are right. It didnt make sense how I wrote it, thanks for the clarification. I edited the post to specify `consumed by an app` – bp2010 May 30 '18 at 11:58
  • I wanted to see if using the `timestamp` from within the kafka metadata or similar would be possible – bp2010 May 30 '18 at 12:05
  • 1
    @Harold Kafka 0.10 includes a timestamp with every event. Add `--property print.timestamp=true` – OneCricketeer May 30 '18 at 12:37
  • The real way to do this would be within the consumer applications (for a consumer group) or exposing JMX on the Kafka cluster and setting up external monitoring solutions to that – OneCricketeer May 30 '18 at 12:39
  • Aplogize for my mistake regarding timestamp. You can monitor the offsets https://stackoverflow.com/questions/28579948/java-how-to-get-number-of-messages-in-a-topic-in-apache-kafka but unless to consume the messages with a small Java program I don't see any quick solution. – Harold May 30 '18 at 12:47

1 Answers1

1

It's unclear if you want to find this information from the broker side or from your consumer.

In case it's OK from the consumer side, you can check the following metric:

kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}"

It has a records-consumed-total attribute that indicates how many records the consumer instance has received.

Mickael Maison
  • 25,067
  • 7
  • 71
  • 68