Given the constraints on a, b, and c, we can compute the desired result as (short) (a * lround(10*b) / lround(10*c))
or as (short) (a * b / c + .00005)
. (This of course requires that the result be representable in short. That is not guaranteed by the stated limits on a
, b
, and c
.)
In the former, ab/c is equivalent to a•10b/(10c), so we just need to show this is what the expression computes, including that the arithmetic does not suffer from rounding errors. We know b
would ideally be a multiple of .1, so 10*b
would be an integer. lround(10*b)
finds this integer, effectively correcting for any error that occurred in converting a decimal numeral to the double
format. Similarly, lround(10*c)
finds the ideal value of 10*c
. lround
returns a value in the long
type, so the multiplication and the division are performed with integer arithmetic. Also, the long
type is capable of representing the necessary range. (a * lround(10*b)
is limited to 3,650,000, and long
can represent up to at least 2,147,483,647.) So the multiplication is exact, the division truncates the way we desire.
A proof for the latter follows.
The following assumes the IEEE-754 “double” format is used for double
. It is sufficient that, after <float.h>
is included, #if DBL_MANT_DIG >= 53
is true.
Conversion of a numeral in a user-provided string to double
ought to yield a number with error of at most one unit of least precision (ULP). This is suggested in various places in the C standard (and it is unclear whether certain parts of the text intend to require this). However, let’s assume we have a bad conversion with error up to 1024 ULP.
b
and c
can be up to 1000, for which the ULP is 2−43, so 1024 ULP is 2−33. Thus, if the user enters b, b
is b(1+eb), where |eb| ≤ 2−33, and c
is similarly c(1+ec). As an integer up to 365, a
is of course exactly the a entered by the user.
When a * b
is computed, the result is
ab(1+eb)(1+e0), where e0 is the error introduced by the multiplication. In any rounding mode, |e0| is less than 1 ULP. The maximum value at this point is 365,000, for which 1 ULP is 2−34.
Then we divide by c
, for which the minimum value is 1. The result is
ab(1+eb)(1+e0)/(c(1+ec))•(1+e1), where e1 is the error introduced the division. Again the maximum value is 365,000, so |e1| < 2−34.
Rearranging, the result is ab/c • (1+eb)(1+e0)(1+e1)/(1+ec). We can easily see a bound for the error is given when eb is +2−33, e0 and e1 are 2−34, and ec is −2−33. Some work with a calculator shows us the error is less than 3.5•10−10.
Note that ab/c equals 10ab/(10c) and consider the latter. The numerator is an integer, and the denominator is an integer that is at most 10,000. Therefore, the closest the quotient can be to an integer without being an integer is .0001. And the closest the result computed with rounding errors can be to an integer is less than .0001 − 3.5•10−10. Therefore, if we add .00005 to the computed result, it will push every result that would have been an integer (without rounding errors) above that integer but it will not push any result that would not have been an integer above the next integer. Therefore, the integer below a * b / c + .00005
is the desired result, and casting to short
provides that integer, if it is it range of short
.