2

I'm evaluating elascticsearch and I generated a bunch of fake data. The amount field is defined as a double. Here's the mapping "authamount": { "type": "double" }, etc...

In the java code that does the random number I specify 2 decimal places and the data looks ok in elasticsearch.

When I run a stats query as follows:

{
    "query" : { "constant_score": { "filter": {
                "range": {
                    "txndatestring": {
                        "gte": "2017-01-01T15:44:04.068Z",
                        "lte": "2017-01-31T15:44:04.068Z"
                    }
                }
            }
        }
    },
    "aggs" : {  "auth_amount_stats" : { "stats" : { "field" : "authamount" } }
    }
}

I see this result:

"aggregations": {
        "auth_amount_stats": {
            "count": 20810,
            "min": 5.03,
            "max": 1474.24,
            "avg": 734.682198942815,
            "sum": 15288736.559999982
        }}

I don't understand how the sum can have so many decimal places with a sum.

VladimirSD
  • 401
  • 3
  • 11
  • Possible duplicate of [Is floating point math broken?](https://stackoverflow.com/questions/588004/is-floating-point-math-broken) – Nathan Hughes Jul 24 '18 at 00:19

1 Answers1

2

Floating point decimal values don't generally have an exact binary representation. This is due to the way the CPU represents floating point values. This is not normally significant and can be dealt with by rounding to the appropriate number of decimal points when displaying the number. However, the tiny amount of variance between your two-decimal-point number and the internal floating point representation of it will compound when doing arithmetic like your Sum.

For this reason you have to be careful when comparing floats. For instance, your Sum would not be strictly equal to 15,288,736.56 due to the loss in precision, even though that is what the Sum should be as a decimal value.

You could use a scaled_float to represent your two-decimal number:

    "authamount": {
      "type": "scaled_float",
      "scaling_factor": 100
    }

Scaled floats are stored as a long and scaled by the factor which is a double so they are more efficient.

Chris Latta
  • 20,316
  • 4
  • 62
  • 70