I am trying to check the size of the queue in Kafka for a certain topics at regular time intervals. Although, I can't figure how to even check that metric even once. I'm completely new to Kafka so I'm not sure exactly what to do for this. I assume that it will involve creating either a producer or a consumer to interact with the queue, but I've hit a roadblock.
Asked
Active
Viewed 2.1k times
7
-
1See the answer to a similar question here: http://stackoverflow.com/questions/28579948/how-to-get-number-of-messages-in-a-topic-in-apache-kafka-by-java/28617771#28617771 – Lundahl Nov 03 '15 at 08:18
2 Answers
3
I think it is not possible at the moment. You should consider Kafka topic as a infinite data stream, so the only option you have IMO - to count consumed messages in your consumers.
You can use the kafka offset monitoring tool, which will show you log size per topic partition (you have to sum up): http://ingest.tips/2014/10/12/kafka-high-level-consumer-frequently-missing-pieces/

codejitsu
- 3,162
- 2
- 24
- 38
-
So, I'm using the the kafka-python interface. Is there any way to have your consumer try an consume all messages in the queue? As in, is there a set max amount of messages I could try to consume with – user2419509 Nov 03 '15 at 14:25
-
@user2419509 yes, if you set "autooffset.reset" property in your consumer on "smallest" you will consume all messages from beginning. I do not know how is it in kafka-python, but usually you can 'slice' your stream like: stream.slice(0, maxMessages) – codejitsu Nov 03 '15 at 14:47
-
Thanks for that. From what I understant, in order to consume all the messages myself I would have to create a new consumer group which will be able to consume all of the same messages as the existing consumer group on the same topic. Is that correct? Also, can I just add the another entry for 'group.id' in my current consumer.config file, or would that mess around with the existing consumer? – user2419509 Nov 03 '15 at 15:51
-
I say this because if I have multiple consumers in one consumer groups, then they won't all get all of the messages in that topic. Is this assumption right? – user2419509 Nov 03 '15 at 15:52
-
@user2419509 AFAIK you don't need to change the group name - just set the autooffset.reset property to "smallest". But, if you want to preserve the old offsets, then of course you are free to define a new consumer group. In that case the new group will read all messages from the beginning and will have own offsets. – codejitsu Nov 03 '15 at 16:04
-
So i don't want to mess with the settings of the current consumer, because it is a service that is already up and running, but I want to extract information about the offset, number of messages, or how the offset has changed in a certain period of time. So my only two options seem to be either a new consumer group to consume the messages as well, or just check how much the offset changes between reads of the queue. Would simply checking the offsets be possible? – user2419509 Nov 03 '15 at 16:13
-
@user2419509 this is actually the problem - there are no simple way to do it now IMHO. An other option could be kafka offset monitoring tool itself - you can find the source code on github if you want. This tool can definitely get the topic size. – codejitsu Nov 03 '15 at 16:39
-
does the offset monitoring tool have a command-line interface instead of having to use the graphical interface? I only have a ssh access to this machine so no possibility of gui. – user2419509 Nov 03 '15 at 16:59
-
so would this be possible: I create a new consumer with their own group. Have them start consuming topic 'A'. Set the 'auto_reset_offset' to 'largest' so that I start reading at the most recent message, then I try to consume all new messages every second or two seconds, and use the difference of the previous offset and the current offset to see, effectively, 'the number of messages in the topic' at each time period. – user2419509 Nov 03 '15 at 17:02
-
@user2419509 yes, offset monitoring tool is a command line tool without gui. – codejitsu Nov 03 '15 at 17:09
-
yeah but it runs an app on port 8080, which I am not able to access remotely – user2419509 Nov 03 '15 at 17:26
1
- If you want to know how many msgs left to consume by topic and by partition : programmatically, you have to query Zookeeper if you are using the high level consumer client. Datas related to the current offset position are stored under the path /kafka/consumers. Take a look at the Kafka Offset Monitor tool. It will give you the idea of the kind of datas that are stored in ZK. This behavior will change in the next release 0.9.0 as write intensive in ZK is not an optimal use case.
- If you want to know how many msgs in total in the topic : you have to count by yourself with consumers. Or mirroring messages to an another Kafka cluster dedicated to analytic purposes (stats, count, anything).
The queue size notion in Kafka is irrelevent because it is not a queue but a log. You can consume, rewind, jump as you wish to any offset.

Minh-Triet LÊ
- 1,374
- 9
- 17
-
the issue with Kafka Offset Monitor is that I only have ssh access to this machine, so there's not way that I could interact with the interface of the monitor. Is there an command-line-only way of achieving the same thing? – user2419509 Nov 03 '15 at 15:54
-
If you have ssh access than you can use ssh tunneling to use the interface right ? Anyway on command line you can use zkCli.sh packaged with Zookeeper to retrieve zk node data. – Minh-Triet LÊ Nov 04 '15 at 09:24