0

I have encountered the case of inexactness in the precision of floating point numbers. As I show below as a simplified version of my code, I compute a sum in equivalent scenarios and in the very last decimals I get different values. This is related to Is floating point math broken.

I was wondering if there is any way to resolve this, for example some sort of truncation?

s1 = zero(Float64)
s = zero(Float64)
for a in 1:10000
    for k in 1:4
        temp=log(1e-5)
        s1+=temp
        s +=temp
    end
end
s2 = zero(Float64)
for a in 1:10000
    for k in 1:4
        temp = log(1e-5)
        s2+=temp
        s+=temp
    end
end

s1+s2 == s     ###FALSE
s < s1+s2      ###TRUE
A.Yazdiha
  • 1,336
  • 1
  • 14
  • 29
  • 3
    Possible duplicate of [Is floating point math broken?](https://stackoverflow.com/questions/588004/is-floating-point-math-broken) – EOF Sep 06 '17 at 14:10
  • 4
    use `≈`(\approx) instead of `=`? `s1 + s2 ≈ s #=> true` – Gnimuc Sep 06 '17 at 15:16
  • 2
    Not a duplicate: this is about strategies to resolve the floating point issue in a particular language, not about why it happens. – mbauman Sep 07 '17 at 20:15

2 Answers2

4

Floating point addition is not associative: changing the order in which you do the addition can give subtly different results due to intermediate rounding.

There are a couple of things you can do:

  1. Use a more accurate summation algorithm, such as Kahan compensated summation. This is available in Julia as sum_kbn:

julia> sum_kbn(log(1e-5) for a = 1:10000, k = 1:4) -460517.01859880914

  1. Check for approximate equality instead of exact equality. As mentioned by Gnimuc, you can use isapprox (which is also available via the infix ). This accepts atol and rtol optional arguments for specifying the absolute and relative tolerances, respectively.

s1+s2 ≈ s

Simon Byrne
  • 7,694
  • 1
  • 26
  • 50
2

A Float64 represents a number in the IEEE 754 format. Out of the 64 bit the mantissa occupies 52 bits. This means a precision of about 15 decimal digits.

After your first loop, s1 and s should be the same.

The difference happens in the second loop body. The number in temp is always the same. s is nearly temp * 40000.0 when entering the first time the loop body of the second loop. When adding temp to s this causes to loose the last 4.5 digits of temp. This is the cause of the differences since s2 will be zero and adding temp will use all digits of temp.

by the way, none of s,s1+s2 will be (programmatically) the same as temp * 80000.0 (mathematically all 3 numbers should be the same)

So for reducing numeric errors obey the rules

  • prefer multiplications, they retain precision
  • adding numbers are more precise the more they are equal. You loose precision in the order of a / b (while a = abs(A), b = abs(B), a > b, b != 0)
  • subtracting numbers looses precision the more they are equal. You loose the precision in the order of leading common digits, or more exactly bits.

This all is no specific problem of julia, it will happen in all languages which using IEEE 754 (C/C++ double, Java double, Julia Float64,..)

s1 = zero(Float64)
s = zero(Float64)
for a in 1:40000
    temp=log(1e-5)
    s1+=temp
    s +=temp 
end
s2 = zero(Float64)
for a in 1:40000
    temp = log(1e-5)
    s2+=temp
    s+=temp
end

sm = 80000.0 * log(1e-5);
println(s1+s2 == s)
println(s < s1+s2)
println(s == sm)
println(s1+s2 == sm)

println(s1+s2)
println(sm)
println(s)

outputs

false
true
false
false
-921034.0371971375
-921034.0371976183
-921034.0371986133
stefan bachert
  • 9,413
  • 4
  • 33
  • 40