1

I'm using Kafka with Avro messages. One of my fields is defined like this:

{ 
    "name": "a_number", 
    "type": "bytes", 
    "logicalType": "decimal", 
    "precision": 4, 
    "scale": 4 
}

Using the Avro console consumer, I see a message like this:

{"a_number": "\t\u0000°"}

Which I expect to equal 59.

Supposedly, the bytearray should be the twos-compliment of the number. I've tried using Python's struct module to decode it, but the values I get don't make any sense:

bs = '\t\u0000°'.encode('utf8')    # b'\t\x00\xc2\xb0'
struct.unpack('>l', bs)[0] / 1e4   # 15104.4784

How can I validate the message? Can I decode the string somehow, or has the Avro console consumer corrupted it?

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
z0r
  • 8,185
  • 4
  • 64
  • 83
  • For comparison: `struct.pack('>l', int(59 * 1e4)) == b'\x00\t\x00\xb0'` – z0r Nov 21 '18 at 22:16
  • Are you sure that you want to encode high ASCII values as UTF-8? They will gain at least one additional byte, then, which will influence the total value. That said, at least you will *get* 4 bytes. Your sample string defines only 3. – Jongware Nov 21 '18 at 22:38
  • @usr2564301 Yeah, I'm not sure - it does seem weird. The reason I chose UTF-8 is that that's what JSON uses to encode strings, and the output of the Avro console consumer is (apparently) JSON. I am a bit suss on that string; I would have expected it to write something in Base64 or so. – z0r Nov 21 '18 at 23:01

1 Answers1

1

You seem to be going about this the Hard Way. The approach suggested by How to extract schema for avro file in python is to use:

reader = avro.datafile.DataFileReader(open('filename.avro',"rb"),avro.io.DatumReader())
schema = reader.meta

Single stepping in a debugger to see how the reader decodes your messages should get you closer to assembling a "raw" hand engineered decode.

J_H
  • 17,926
  • 4
  • 24
  • 44
  • Yep fair point. I don't have a `.avro` file to read, but maybe I should just write a little Python script using an `AvroConsumer` instead of the (presumably) Java-based console consumer to test it. – z0r Nov 22 '18 at 02:43