0

In SQL Server I create an aggregated column (a combination of other columns that I add, multiple, sum etc) which is of SQL datatype float.

However, when I run the same query multiple times, the last 2 digits of my float are unstable and keep changing.

Below the floats I get with the random last two digits - I try to convert to decimal and then chop off the last two digits.

select round(convert(decimal(20,19), 0.0020042890676442646), 17,1)
select round(convert(decimal(20,19), 0.0020042890676442654), 17,1)

In SSMS the result for both is: 0.0020042890676442600 as expected.

Mind you, the input constants here i took from python, so they might have been modified already. I can't take them from sql directly, as it is incredibly rare to get the calculation anomaly and i don't know how to reproduce it.

But running this via pypyodbc to python, sometimes the result is a python decimal.Decimal type with value 0.0020042890676442700 for the second statement, so it does seem to do rounding rather than truncation.

I have also noticed that the result of the calculation in sql is not always the same, and there is instability there in the last digit of the float - not sure how to test this sytematically though.

The constants casted to floats give:

select convert(float,0.0020042890676442646)
select convert(float,0.0020042890676442654)

Result: 0.00200428906764427.

Wrapped in decimals and rounded:

select round(convert(decimal(20,19), convert(float,0.0020042890676442646)), 17,1)
select round(convert(decimal(20,19), convert(float,0.0020042890676442654)), 17,1)

The result in SSMS is: 0.0020042890676442700 in both cases.

I tried sending back the floats directly instead of casting to decimal, but it seems the two unstable digits are always added at the end when they reach python. Even truncating doesn't help, other random numbers are then added.

It almost seems as if python modifies both float and Decimal during transport in a random manner, or that the instability is in sql already or both.

I tried truncating the np.float64 on the python side like this: Truncating decimal digits numpy array of floats

but as the last float digit in sql can be between e15 and e19 I can't set a consistent truncate level unless i floor everything at e15.

Kai Aeberli
  • 1,189
  • 8
  • 21
  • if you check above, i am converting the float to decimal(20,19). Is this what you meant? I can't change the columns in my aggregation unfortunately, as they come from a production database. – Kai Aeberli Jul 29 '18 at 18:40
  • 1
    . . I don't think you have any floats in your code. The constants are, I think, interpreted as decimals. – Gordon Linoff Jul 29 '18 at 18:50
  • 1
    You can see using `sp_describe_first_result_set` that SQL Server interprets both literals as `numeric(19,19)`: EXEC sp_describe_first_result_set N'SELECT 0.0020042890676442646'; EXEC sp_describe_first_result_set N'SELECT 0.0020042890676442654';. It seems python is doing something behind your back. – Dan Guzman Jul 29 '18 at 18:57
  • thanks Gordon, I updated the question. – Kai Aeberli Jul 29 '18 at 19:01
  • agree Dan - seems in the first case its a numeric(19,19), which sql correctly truncates to 17, so ending in 600. In the second case, that float ends in 7, which truncated to 17 gives an ending of 700. – Kai Aeberli Jul 29 '18 at 19:12
  • it seems python makes the sql float longer by adding two extra digits at the end, and these two digits are random values – Kai Aeberli Jul 29 '18 at 19:17
  • 2
    The first issue is, why would an aggregate give different answers at different times? When calculating an aggregate there is no guarantee that the records will be processed in the same order every time, in the same way that order is not guaranteed in any query unless you use ORDER BY. When doing math on floats, order can be important. If you have a list of floats and start adding from the smallest values you can get a different answer than if you start adding from the largest values. – David Dubois Jul 29 '18 at 20:19
  • Thanks David - to your knowledge, is there a way of enforcing a processing order in an aggregate in sql that could help stabilise the values? – Kai Aeberli Jul 29 '18 at 20:49
  • 1
    Floats are only accurate to 15 digits in SQL Server. If you have more than that, the precision will be lost. This is documented. – Thom A Jul 29 '18 at 21:37
  • Thanks Larnu - in that case, what is the meaning of float columns which have more than 15 digits after the comma? I am trying to understand why more than 15 digits are shown if it is only accurate to 15 digits. For example in SSMS I can see: 0.002 004 289 067 644 26 which is 17 digits. – Kai Aeberli Jul 29 '18 at 22:17
  • 2
    @KaiAeberli - There are 17 digits after the decimal separator but the first two zeros are not *significant digits*, they are merely a result of the scaling (exponent) of the float value: 2.00428906764426E-3 – Gord Thompson Jul 30 '18 at 18:05

1 Answers1

3

The order of processing of an aggregate is undefined, in the same way that the order of the results of any query are undefined unless you use an ORDER BY clause. In the case of floats, order matters. Order of aggregate processing can be forced using an OVER clause. Here's some code to demonstrate:

 -- demonstrate that order matters when adding floats

declare @a float
declare @b float
declare @c float
declare @d float
declare @e float

set @a = 1
set @b = 1
set @c = 9024055778268167

-- add A to B, and then add C
-- result is 9024055778268170

set @d = @a + @b

set @e = @d + @c

select cast( @e as decimal(38,0) )

-- add C to B, and then add A
-- result is 9024055778268168

set @d = @c + @b

set @e = @d + @a

select cast( @e as decimal(38,0) )

-- put these values into a table

create table OrderMatters ( x float )

insert into OrderMatters ( x ) values ( @a )
insert into OrderMatters ( x ) values ( @b )
insert into OrderMatters ( x ) values ( @c )

declare @x float

-- add them in ascending order
-- result is 9024055778268170

select @x = sum(x) over (order by x asc ) from OrderMatters

select cast(@x as decimal(38,0))

-- add them in descending order
-- result is 9024055778268168

select @x = sum(x) over (order by x desc ) from OrderMatters

select cast(@x as decimal(38,0))
David Dubois
  • 3,842
  • 3
  • 18
  • 36
  • This answered one of the problems with my python code, I had refactored a+b to b+a thus the imprecision. The other problems were: compute floats in SQL which were than badly translated to python. I ended up just transfering integers via ODBC and computing the floats in python, so there was no need to convert from sql float to np.float64, which seemed to create the imprecision. – Kai Aeberli Jul 31 '18 at 09:15