Python double loses precision using avro schema

Question

I'm serialising some data using 'Avro' schema, the code is written in Python and I'm facing precision lost. Looks like Python is rounding the numbers and adding the scientific notation to it.

What I see: 1.2345678901234568e+16

What I expect to see: 12345678901234567.19

The code example is below.

Reproducible code sample:

from fastavro import writer, reader, parse_schema

schema = {
    'doc': 'A weather reading.',
    'name': 'Weather',
    'namespace': 'test',
    'type': 'record',
    'fields': [
        {'name': 'station', 'type': 'string'},
        {'name': 'time', 'type': 'double'},
        {'name': 'temp', 'type': 'double'},
    ],
}
parsed_schema = parse_schema(schema)

# 'records' can be an iterable (including generator)
records = [
    {u'station': u'011990-99999', u'temp': 0, u'time': 1433269388},
    {u'station': u'011990-99999', u'temp': -11, u'time': 12345678901234567.19},
    {u'station': u'012650-99999', u'temp': 111, u'time': 1433275478},
]

# Writing
with open('weather.avro', 'wb') as out:
    writer(out, parsed_schema, records)

# Reading
with open('weather.avro', 'rb') as fo:
    for record in reader(fo):
        print(record)

I believe there might be a way to (override) write my own deserialiser which would give me the control on how a double is deserialized into a string.

Any ideas?

You're seeing scientific notation. Have you tried to expand that whole number? — OneCricketeer, Oct 26 '21 at 16:13
https://stackoverflow.com/questions/658763/how-to-suppress-scientific-notation-when-printing-float-values — OneCricketeer, Oct 26 '21 at 16:57
Yes I tried, 'Expansion' just grabs the number and displays it in the selected format, does not handle the precision problem: give yourself a try: b = 123456789012345678.789 >>> b 1.2345678901234568e+17 >>> f'{b:20.5f}' '123456789012345680.00000' — Bruno, Oct 26 '21 at 18:03
I don't think this has really anything to do with avro. As the answer below shows, Decimal types or Python's decimal class would be better for precise values — OneCricketeer, Oct 26 '21 at 21:26
OneCricketeer, indeed, it's not avro's fault, just how Python handles big numbers, using decimal will solve the trick. — Bruno, Oct 27 '21 at 09:08

score 0 · Accepted Answer · answered Oct 26 '21 at 16:19

If you want to go about using a custom logical type, fastavro has support for that: https://fastavro.readthedocs.io/en/latest/logical_types.html#custom-logical-types. Of course, if other implementations are also being used, then they won't understand the custom logical type.

However, the main problem comes from floating point number rounding that is present in just about every language. A better choice to ensure that no rounding is done is probably to use a Decimal type: https://avro.apache.org/docs/current/spec.html#Decimal

Not exactly the full answer I was looking for, but it does help a lot, marked as solution. — Bruno, Oct 27 '21 at 09:09

Python double loses precision using avro schema

1 Answers1