2

I can use KafkaTool and kafka-console-consumer to view data from the __consumer_offset topic, but I can't figure out how to parse the data in python if I read it directly with my own custom tool. Even when using KafkaTool, I can't decipher the key and value perfectly, there are odd characters that don't seem to follow any pattern. I think it has to do with the way Scala marshals the data into the raw bytes.

Here's the key format: [short: version] [string: group] [string: topic] [int32: partition] which can be gotten from https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala - This assumes version 0 which mine is.

Here's an example key in hex format: 00 01 00 16 63 6F 6E 73 6F 6C 65 2D 63 6F 6E 73 75 6D 65 72 2D 39 37 30 38 32 00 0D 73 74 61 67 69 6E 67 2D 73 70 65 6E 64 00 00 00 26

Now going through those bytes -

00 - version 0

01 00 - start-of-heading, null … okay, makes sense but other messages begin with 02 00

16 63 6F 6E 73 6F 6C 65 2D 63 6F 6E 73 75 6D 65 72 2D 39 37 30 38 32 - Looks like good data

00 0D - Null, carriage return … okay makes sense but others have 00 0C

73 74 61 67 69 6E 67 2D 73 70 65 6E 64 - good data (“staging-spend”)

00 00 00 26 - I guess this is the end of the string plus partition in which case 00 00 denotes the end of the string??

Similar issues/inconsistencies with the message. How exactly is the data formatted so I can parse it into string values?

s g
  • 5,289
  • 10
  • 49
  • 82
  • Possible duplicate of [Kafka 0.8.2.1 how to read from \_\_consumer\_offsets topic](https://stackoverflow.com/questions/33925866/kafka-0-8-2-1-how-to-read-from-consumer-offsets-topic) – OneCricketeer Jul 21 '18 at 06:42
  • Downvote and dupe when you're not even clear on the question? I want to do exactly what I stated. Given the bytes, I want the string data. I'm using python and the string+numerical (short, int, and long) data has been marshaled in scala. Big picture is that I want a consumer which monitors the other consumers for their lag. Happy to hear input on that but the original question still remains. Cheers – s g Jul 21 '18 at 07:10
  • If you can read Scala, then that formatter is completely open source. I down voted for no research https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala – OneCricketeer Jul 21 '18 at 07:14
  • It shouldn't matter if data is serialized in Scala. You can get a bytes object in python – OneCricketeer Jul 21 '18 at 07:16
  • Since you seem so diligent, i'll add the gory detail and maybe you can take a look =) – s g Jul 21 '18 at 07:17
  • 1
    Personally, I would suggest using regular console consumer from that answer, then piping the output into a Python program to read the plain string text. Then you don't need any Kafka python libraries – OneCricketeer Jul 21 '18 at 07:19
  • Great. Let's stick to the question at hand, for now. – s g Jul 21 '18 at 07:20
  • 1
    Well, for starters, `00 01` is the version since it's a short integer, it's two bytes – OneCricketeer Jul 21 '18 at 07:30
  • I'd agree here that the Scala formatter performs no magic. The question here seems to be asking someone else to debug the translation of Scala to python. The solution of using the console consumer and piping output to python prevents you from reinventing the formatter but you can still use python. – dawsaw Jul 22 '18 at 14:22
  • Did you manage to figure it out? – Erikas Jan 13 '21 at 13:16

0 Answers0