1

I am new in avro and I have a avro file to deserialize. Some schemas use fixed type of data to store MAC addresses. Below schema is one of those schemas and used in different schemas as a type.

The schema for MAC addresses like below:

{
    "type": "fixed",
    "name": "MacAddress",
    "size": 6
}

I wrote the first record of the data to a text file using:

from avro.datafile import DataFileReader
from avro.io import DatumReader

reader = DataFileReader(open("data.avro", "rb"), DatumReader())
count = 0
for record in reader:
    if count == 0:
        with open('first_record.txt', 'w') as first_record:
            first_record.write(str(record))
    elif count > 0: break
    count = count + 1
reader.close()

The above mentioned MAC addresses appears in the deserialized data like:

"MacAddress":"b""\\x36\\xe9\\xad\\x64\\x2d\\x3d",

I know that \x means the following is a hexadecimal value. So this is suppose to be "36:e9:ad:64:2d:3d", right? Are "b""" style values the expected output for fixed types?

Also, some values are like below:

"Addr":"b""j\\x26\\xb7\\xda\\x1d\\xf6"

"Addr":"b""\\x28\\xcb\\xc5v\\x14%" 

How come these are MAC addresses? What does j, % characters means?

bhdrozgn
  • 167
  • 10

1 Answers1

2

Are "b""" style values the expected output for fixed types?

Yes, since fixed types represent bytes and on Python a string of bytes is represented with a prepended b before thing string. It looks like you have a lot of extra quotes in there and I'm guessing that's because you are doing things like str(record) which is probably causing the extra backslashes and quote characters. For example:


>>> str(b"\xae")
"b'\\xae'"

How come these are MAC addresses? What does j, % characters means?

Are you sure these are the same record type? The key is Addr instead of MacAddress so it seems like it might be a different record type and schema.

Scott
  • 1,799
  • 10
  • 11
  • Yes, I noticed the same thing a few hours ago. I used a json formatter after writing to the file and the formatter caused extra backslashes and quotes. Normally it is like "Addr": b'j\x26\xb7\xda\x1d\xf6 and simple use of .hex() method on these byte-strings gaves me what I want. And yes, they have the same schema, the mac address schema is used by many record with different names. – bhdrozgn Aug 24 '21 at 22:15