0

I'm serialising some data using 'Avro' schema, the code is written in Python and I'm facing precision lost. Looks like Python is rounding the numbers and adding the scientific notation to it.

What I see: 1.2345678901234568e+16

What I expect to see: 12345678901234567.19

The code example is below.

Reproducible code sample:

from fastavro import writer, reader, parse_schema

schema = {
    'doc': 'A weather reading.',
    'name': 'Weather',
    'namespace': 'test',
    'type': 'record',
    'fields': [
        {'name': 'station', 'type': 'string'},
        {'name': 'time', 'type': 'double'},
        {'name': 'temp', 'type': 'double'},
    ],
}
parsed_schema = parse_schema(schema)

# 'records' can be an iterable (including generator)
records = [
    {u'station': u'011990-99999', u'temp': 0, u'time': 1433269388},
    {u'station': u'011990-99999', u'temp': -11, u'time': 12345678901234567.19},
    {u'station': u'012650-99999', u'temp': 111, u'time': 1433275478},
]

# Writing
with open('weather.avro', 'wb') as out:
    writer(out, parsed_schema, records)

# Reading
with open('weather.avro', 'rb') as fo:
    for record in reader(fo):
        print(record)

I believe there might be a way to (override) write my own deserialiser which would give me the control on how a double is deserialized into a string.

Any ideas?

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Bruno
  • 924
  • 9
  • 20
  • You're seeing scientific notation. Have you tried to expand that whole number? – OneCricketeer Oct 26 '21 at 16:13
  • What exactly do you mean by expanding the whole number? – Bruno Oct 26 '21 at 16:30
  • https://stackoverflow.com/questions/658763/how-to-suppress-scientific-notation-when-printing-float-values – OneCricketeer Oct 26 '21 at 16:57
  • Yes I tried, 'Expansion' just grabs the number and displays it in the selected format, does not handle the precision problem: give yourself a try: b = 123456789012345678.789 >>> b 1.2345678901234568e+17 >>> f'{b:20.5f}' '123456789012345680.00000' – Bruno Oct 26 '21 at 18:03
  • I don't think this has really anything to do with avro. As the answer below shows, Decimal types or Python's decimal class would be better for precise values – OneCricketeer Oct 26 '21 at 21:26
  • OneCricketeer, indeed, it's not avro's fault, just how Python handles big numbers, using decimal will solve the trick. – Bruno Oct 27 '21 at 09:08

1 Answers1

0

If you want to go about using a custom logical type, fastavro has support for that: https://fastavro.readthedocs.io/en/latest/logical_types.html#custom-logical-types. Of course, if other implementations are also being used, then they won't understand the custom logical type.

However, the main problem comes from floating point number rounding that is present in just about every language. A better choice to ensure that no rounding is done is probably to use a Decimal type: https://avro.apache.org/docs/current/spec.html#Decimal

Scott
  • 1,799
  • 10
  • 11
  • Not exactly the full answer I was looking for, but it does help a lot, marked as solution. – Bruno Oct 27 '21 at 09:09