Ensure float to be smaller than exact value

Question

I want to calculate a sum of the following form in C++

float result = float(x1)/y1+float(x2)/y2+....+float(xn)/yn

xi,yi are all integers. The result will be an approximation of the actual value. It is crucial that this approximation is smaller or equal to the actual value. I can assume that all my values are finite and positive. I tried using nextf(,0) as in this code snippet.

cout.precision( 15 );
float a = 1.0f / 3.0f * 10; //3 1/3
float b = 2.0f / 3.0f * 10; //6 2/3
float af = nextafterf( a , 0 );
float bf = nextafterf( b , 0 );
cout << a << endl;
cout << b << endl;
cout << af << endl;
cout << bf << endl;
float sumf = 0.0f;
for ( int i = 1; i <= 3; i++ )
{
    sumf = sumf + bf;
}
sumf = sumf + af;
cout << sumf << endl;

As one can see the correct solution would be 3*6,666... +3.333.. = 23,3333... But as output I get:

3.33333349227905
6.66666698455811
3.33333325386047
6.66666650772095
23.3333339691162

Even though my summands are smaller than what they should represent, their sum is not. In this case applying nextafterf to sumf will give me 23.3333320617676 which is smaller. But does this always work? Is it possible that the rounding error gets so big that nextafterf still leaves me above the correct value?

I know that I could avoid this by implementing a class for fractions and calculating everything exactly. But I'm curious whether it is possible to achieve my goal with floats.

IEEE754/floats are subject to rounding and representation errors. If you absolutely need exact values consider using fractions or arbitrary precision numeric types. — BlamKiwi, Aug 19 '15 at 12:53
Set the rounding mode to `FE_DOWNWARD` and then do your calculation. Or `nextafterf(foo, -1.0/0.0)` your calculation 2n-1 times. — tmyklebu, Aug 19 '15 at 12:54
@RegretBomb: It's like you didn't even bother reading the question... — tmyklebu, Aug 19 '15 at 12:54
@tmyklebu Slightly redundant text doesn't mean I didn't read your question. — BlamKiwi, Aug 19 '15 at 12:56
@RegretBomb It's like you read so little that you didn't even see that the asker and the person telling you that your remarks are irrelevant are different persons. — Pascal Cuoq, Aug 19 '15 at 14:27

score 5 · Answer 1 · edited May 23 '17 at 11:59

5

Try changing the float rounding mode to FE_TOWARDZERO.

See code example here:

Change floating point rounding mode

edited May 23 '17 at 11:59

Community

1
1

answered Aug 19 '15 at 12:54

Support Ukraine

42,271
4
38
63

This doesn't work for me. Some values are still rounded to infinty. I read that `#pragma STDC_FENC_ACCESS` is compiler-dependent. May this be the issue here? I'm using gcc 4.8.4. – Ricardo Aug 19 '15 at 13:36
Setting the rounding mode worked as `fegetround` gives another result than before. But sadly this does not affect the result of the rounding. – Ricardo Aug 19 '15 at 13:42
@Ricardo I get the sum 23.3333327770233 - is the posted code 100% identical to your running code? – Support Ukraine Aug 19 '15 at 14:25
Thanks a lot. I removed the `nextafterf` function calls believing that the rounding will do what they did. If I keep them, I get a smaller result. But why is summing affected by the rounding type, but not divisions of the form `2.0f/3.0f` ? – Ricardo Aug 19 '15 at 14:55
2

@Ricardo: Most compilers will do constant folding in round-to-nearest by default. You need to specify `-frounding-math` with gcc (`#pragma STDC FENV_ACCESS ON` does *not* do the trick). I'm not even certain that `-frounding-math` completely works... – tmyklebu Aug 19 '15 at 15:53
@tmyklebu - true - it doesn't seem to work for all tool chains. I did some testing using different systems/tools and the result wasn't entirely as I expected. Anyway this was the best proposal I had. Do you know of a better solution? Despite the up-votes I got, I have a feeling that my answer doesn't cover the question entirely.... – Support Ukraine Aug 19 '15 at 19:11
2

@StillLearning: I don't know a better solution for people who want directed rounding. It's unfortunate that compiler vendors don't care about implementing their languages correctly, and it's also unfortunate that there's no real recourse besides whining like me on the Internet if you haven't got several months to learn a compiler's internal workings well enough to contribute. But that seems to be how life currently is. – tmyklebu Aug 19 '15 at 19:17
Changing the mode to `FE_TOWARDZERO` fails when converting `y1` to `float` results in a smaller value than the mathematically exact result. Using this smaller `y` value in the division could result in a larger than exact quotient and then not meet "approximation is smaller or equal to the actual value". – chux - Reinstate Monica May 22 '16 at 18:35

score 2 · Accepted Answer · answered Aug 19 '15 at 13:42

My immediate reaction is that the approach you're taking is fundamentally flawed.

The problem is that with floating point numbers, the size of step that nextafter will take will depend on the magnitude of the numbers involved. Let's consider a somewhat extreme example:

#include <iostream>
#include <iomanip>
#include <cmath>

int main() { 
    float num = 1.0e-10f;
    float denom = 1.0e10f;

    std::cout << std::setprecision(7) << num - std::nextafterf(num, 0) << "\n";
    std::cout << std::setprecision(7) << denom - std::nextafterf(denom, 0) << "\n";
}

Result:

6.938894e-018
1024

So, since the numerator is a lot smaller than the denominator, the increment is also much smaller.

The result seems fairly clear: instead of the result being slightly smaller than the input, the result should be quite a bit larger than the input.

If you want to ensure the result is smaller than the correct number, the obvious choice would be to round the numerator down, but the denominator up (i.e. nextafterf(denom, positive_infinity). This way, you get a smaller numerator and a larger denominator, so the result is always smaller than the un-modified version would have been.

Interesting behaviour! Not sure if I understand your point, though. num and denom are always integers in my case. — Ricardo, Aug 19 '15 at 14:58
@Ricardo: I used floats for a simple example, but you can get exactly the same situation when the inputs are integers. — Jerry Coffin, Aug 19 '15 at 15:54

score 0 · Answer 3 · answered May 22 '16 at 18:31

float result = float(x1)/y1+float(x2)/y2+....+float(xn)/yn has 3 places where rounding may occur.

Conversion of int to float - it is not always exact.
Division floating point x/floating point y
Addition: floating point quotient + floating point quotient.

By using the next, (either up or down per the equation needs), the results will certainly be less than the exact mathematical value. This approach may not generate the float closest to the exact answer, yet will be close and certainly smaller.

float foo(const int *x, const int *y, size_t n) {
  float sum = 0.0;
  for (size_t i=0; i<n; i++) {  // assume x[0] is x1, x[1] is x2 ...
    float fx = nextafterf(x[i], 0.0);
    float fy = nextafterf(y[i], FLT_MAX);
    // divide by slightly smaller over slightly larger
    float q = nextafterf(fx / fy, 0.0);
    sum = nextafterf(sum + q, 0.0);
  }
  return sum;
}

Ensure float to be smaller than exact value

3 Answers3