Let's start with the difference between x
and log(exp(x))
, because the rest of the computation is the same.
>>> for i in range(10):
... x = 10**-i
... y = exp(x)
... print(x, log(y))
...
1 1.0
0.1 0.10000000000000007
0.01 0.009999999999999893
0.001 0.001000000000000043
0.0001 0.00010000000000004326
9.999999999999999e-06 9.999999999902983e-06
1e-06 9.99999999962017e-07
1e-07 9.999999994336786e-08
1e-08 9.999999889225291e-09
1e-09 1.000000082240371e-09
If you look closely, you might notice that there are errors creeping in.
When = 0, there's no error.
When = 1, it prints 0.1
for and 0.10000000000000007
for log(y)
, which is wrong only after 16 digits.
By the time = 9, log(y)
is wrong in half the digits from log().
Since we know what the true answer is (), we can easily compute what the relative error of the approximation is:
>>> for i in range(10):
... x = 10**-i
... y = exp(x)
... z = log(y)
... print(i, abs((x - z)/z))
...
0 0.0
1 6.938893903907223e-16
2 1.0755285551056319e-14
3 4.293440603042413e-14
4 4.325966668215291e-13
5 9.701576564765975e-12
6 3.798286318045685e-11
7 5.663213319457187e-10
8 1.1077471033430869e-08
9 8.224036409872509e-08
Each step loses us about a digit of accuracy!
Why?
Each of the operations 10**-i
, exp(x)
, and log(y)
only introduces a tiny relative error into the result, less than 10−15.
Suppose exp(x)
introduces a relative error of , returning the number ⋅(1 + ) instead of (which, after all, is a transcendental number that can't be represented by a finite string of digits).
We know that || < 10−15, but what happens when we then try to compute log(⋅(1 + )) as an approximation to log() = ?
We might hope to get ⋅(1 + ) where is very small.
But log(⋅(1 + )) = log() + log(1 + ) = + log(1 + ) = ⋅(1 + log(1 + )/), so = log(1 + )/.
And even if is small, ≈ 10− gets closer and closer to zero as increases, so the error log(1 + )/ ≈ / can get worse and worse as increases because 1/ → ∞.
We say that the logarithm function is ill-conditioned near 1: if you evaluate it at an approximation to an input near 1, it can turn a very small input error into an arbitrarily large output error. In fact, you can only go a few more steps before exp(x)
is rounded to 1 and so log(y)
returns zero exactly.
This is not because of anything in particular about floating-point—any kind of approximation would have the same effect with log!
The condition number of a function is a property of the mathematical function itself, not of the floating-point arithmetic system.
If the inputs came from physical measurements, you could run into the same problem.
This is related to why the functions expm1
and log1p
exist.
Although the function log() is ill-conditioned near 1, the function log(1 + ) is not, so log1p(y)
computes it more accurately than evaluating it log(1 + y)
can.
Similarly, the subtraction in exp(x) - 1
is subject to catastrophic cancellation when ≈ 1, so expm1(x)
computes − 1 more accurately than evaluating exp(x) - 1
can.
expm1
and log1p
are not the same functions as exp
and log
, of course, but sometimes you can rewrite subexpressions in terms of them to avoid ill-conditioned domains.
In this case, for example, if you rewrite log() as log(1 + [ − 1]), and use expm1
and log1p
to compute it, the round-trip is often computed exactly:
>>> for i in range(10):
... x = 10**-i
... y = expm1(x)
... z = log1p(y)
... print(i, x, z, abs((x - z)/z))
...
0 1 1.0 0.0
1 0.1 0.1 0.0
2 0.01 0.01 0.0
3 0.001 0.001 0.0
4 0.0001 0.0001 0.0
5 9.999999999999999e-06 9.999999999999999e-06 0.0
6 1e-06 1e-06 0.0
7 1e-07 1e-07 0.0
8 1e-08 1e-08 0.0
9 1e-09 1e-09 0.0
For similar reasons, you might want to rewrite (exp(x) - 1)/x
as expm1(x)/x
.
If you don't, then when exp(x)
returns ⋅(1 + ) rather than , you will end up with (⋅(1 + ) − 1)/ = ( − 1 + )/ = ( − 1)⋅[1 + /( − 1)]/, which again may blow up because the error is /( − 1) ≈ /.
However, it is not simply luck that the second definition seems to produce the correct results!
This happens because the compounded errors—/ from exp(x) - 1
in the numerator, / from log(exp(x))
in the denominator—cancel each other out.
The first definition computes the numerator badly and the denominator accurately, but the second definition computes them both about equally badly!
In particular, when ≈ 0, we have
log(⋅(1 + )) = + log(1 + ) ≈ +
and
⋅(1 + ) − 1 = + − 1 ≈ 1 + + − 1 = + .
Note that is the same in both cases, because it's the error of using exp(x)
to approximate .
You can test this experimentally by comparing against expm1(x)/x
(which is an expression guaranteed to have low relative error, since division never makes errors much worse):
>>> for i in range(10):
... x = 10**-i
... u = (exp(x) - 1)/log(exp(x))
... v = expm1(x)/x
... print(u, v, abs((u - v)/v))
...
1.718281828459045 1.718281828459045 0.0
1.0517091807564762 1.0517091807564762 0.0
1.0050167084168058 1.0050167084168058 0.0
1.0005001667083415 1.0005001667083417 2.2193360112628554e-16
1.0000500016667082 1.0000500016667084 2.220335028798222e-16
1.0000050000166667 1.0000050000166667 0.0
1.0000005000001666 1.0000005000001666 0.0
1.0000000500000017 1.0000000500000017 0.0
1.000000005 1.0000000050000002 2.2204460381480824e-16
1.0000000005 1.0000000005 0.0
This approximation + to the numerator and the denominator is best for closest to zero and worst for farthest from zero—but as gets farther from zero, the error magnification in exp(x) - 1
(from catastrophic cancellation) and in log(exp(x))
(from ill-conditioning of logarithm near 1) both diminish anyway, so the answer remains accurate.
However, the benign cancellation in the second definition only works until is so close to zero that exp(x)
is simply rounded to 1—at that point, exp(x) - 1
and log(exp(x))
both give zero and will end up trying to compute 0/0 which yields NaN and a floating-point exception.
So you should use expm1(x)/x
in practice, but it is a happy accident that outside the edge case where exp(x)
is 1, two wrongs make a right in (exp(x) - 1)/log(exp(x))
giving good accuracy even as (exp(x) - 1)/x
and (for similar reasons) expm1(x)/log(exp(x))
both give bad accuracy.