how to improve the precision of computing float numbers?

Question

I write a code snippet in Microsoft Visual Studio Community 2019 in C++ like this:

int m = 11;
int p = 3;
float step = 1.0 / (m - 2 * p);

the variable step is 0.200003, 0.2 is what i wanted. Is there any suggestion to improve the precision?

This problem comes from UNIFORM KNOT VECTOR. Knot vector is a concept in NURBS. You can think it is just an array of numbers like this: U[] = {0, 0.2, 0.4, 0.6, 0.8, 1.0}; The span between two adjacent numbers is a constant. The size of knot vector can be changed accroding to some condition, but the range is in [0, 1].

the whole function is:

typedef float NURBS_FLOAT;
void CreateKnotVector(int m, int p, bool clamped, NURBS_FLOAT* U)
{
    if (clamped)
    {
        for (int i = 0; i <= p; i++)
        {
            U[i] = 0;
        }

        NURBS_FLOAT step = 1.0 / (m - 2 * p);
        for (int i = p+1; i < m-p; i++)
        {
            U[i] = U[i - 1] + step;
        }

        for (int i = m-p; i <= m; i++)
        {
            U[i] = 1;
        }
    }
    else
    {
        U[0] = 0;
        NURBS_FLOAT step = 1.0 / m;
        for (int i = 1; i <= m; i++)
        {
            U[i] = U[i - 1] + step;
        }
    }
}

The typical way you would handle this would be to use an exact type, which is not subject to floating point error. I don't know what that would be in C++ though. — Tim Biegeleisen, Oct 22 '19 at 09:46
You can use `double` which has a greater precision than `float`. — Nathanael Demacon, Oct 22 '19 at 09:47
Related : [Is floating point math broken?](https://stackoverflow.com/questions/588004/is-floating-point-math-broken) — Sander De Dycker, Oct 22 '19 at 09:49
Are you trying to generate (or loop through) a sequence like 5, 5.2, 5.4, 5.6, ... , 9.8, 10 with the highest precision possible? — Bob__, Oct 22 '19 at 09:50
I have a test code like this: float step = 1.0f / 5.0f The error exists too. — licb, Oct 22 '19 at 09:56
Possible duplicate of [Is floating point math broken?](https://stackoverflow.com/questions/588004/is-floating-point-math-broken) — lisyarus, Oct 22 '19 at 11:24
What do you need `step` for? Most likely, you can avoid having to compute it. (Adding a step on each time is not a good way to go from one number to another because errors accumulate with each addition. Instead, compute each step without re-using previous results.) — David Schwartz, Oct 22 '19 at 11:28

Ruslan · Answer 1 · 2019-10-22T11:33:32.267

Let's follow what's going on in your code:

The expression 1.0 / (m - 2 * p) yields 0.2, to which the closest representable double value is 0.200000000000000011102230246251565404236316680908203125. Notice how precise it is – to 16 significant decimal digits. It's because, due to 1.0 being a double literal, the denominator is promoted to double, and the whole calculation is done in double precision, thus yielding a double value.
The value obtained in the previous step is written to step, which has type float. So the value has to be rounded to the closest representable value, which happens to be 0.20000000298023223876953125.

So your cited result of 0.200003 is not what you should get. Instead, it should be closer to 0.200000003.

Is there any suggestion to improve the precision?

Yes. Store the value in a higher-precision variable. E.g., instead of float step, use double step. In this case the value you've calculated won't be rounded once more, so precision will be higher.

Can you get the exact 0.2 value to work with it in the subsequent calculations? With binary floating-point arithmetic, unfortunately, no. In binary, the number 0.2 is a periodic fraction:

0.2₁₀ = 0.0̅0̅1̅1̅₂ = 0.0011 0011 0011...₂

See Is floating point math broken? question and its answers for more details.

If you really need decimal calculations, you should use a library solution, e.g. Boost's cpp_dec_float. Or, if you need arbitrary-precision calculations, you can use e.g. cpp_bin_float from the same library. Note that both variants will be orders of magnitude slower than using bulit-in C++ binary floating-point types.

score 1 · Answer 2 · answered Oct 22 '19 at 11:40

One solution to avoid drift (which I guess is your worry?) is to manually use rational numbers, for example in this case you might have:

// your input values for determining step
int m = 11;
int p = 3;

// pre-calculate any intermediate values, which won't have rounding issues
int divider = (m - 2 * p); // could be float or double instead of int

// input
int stepnumber = 1234; // could also be float or double instead of int

// output
float stepped_value = stepnumber * 1.0f / divider;

In other words, formulate your problem so that step of your original code is always 1 (or whatever rational number you can represent exactly using 2 integers) internally, so there is no rounding issue. If you need to display the value for user, then you can do it just for display: 1.0 / divider and round to suitable number of digits.

Another option is to use [`boost::multiprecision::cpp_rational`](https://www.boost.org/doc/libs/1_71_0/libs/multiprecision/doc/html/boost_multiprecision/tut/rational/cpp_rational.html) for complete implementation of rational numeric type. — Ruslan, Oct 22 '19 at 12:02

score 1 · Accepted Answer · answered Oct 22 '19 at 13:58

When dealing with floating point math a certain amount of rounding errors are expected.

For starters, values like 0.2 aren't exactly represented by a float, or even a double:

std::cout << std::setprecision(60) << 0.2 << '\n';
// ^^^ It outputs something like: 0.200000000000000011102230246251565404236316680908203125

Besides, the errors may accumulate when a sequence of operations are performed on imprecise values. Some operations, like summation and subctraction, are more sensitive to this kind of errors than others, so it'd be better to avoid them if possible.

That seems to be the case, here, where we can rewrite OP's function into something like the following

#include <iostream>
#include <iomanip>
#include <vector>
#include <algorithm>
#include <cassert>
#include <type_traits>

template <typename T = double> 
auto make_knots(int m, int p = 0)   // <- Note that I've changed the signature. 
{
    static_assert(std::is_floating_point_v<T>);
    std::vector<T> knots(m + 1);
    int range = m - 2 * p;
    assert(range > 0);
    for (int i = 1; i < m - p; i++)
    {
        knots[i + p] = T(i) / range;  // <- Less prone to accumulate rounding errors
    }
    std::fill(knots.begin() + m - p, knots.end(), 1.0);
    return knots;
}

template <typename T>
void verify(std::vector<T> const& v)
{
    bool sum_is_one = true;
    for (int i = 0, j = v.size() - 1; i <= j; ++i, --j)
    {
        if (v[i] + v[j] != 1.0)   // <- That's a bold request for a floating point type
        {
            sum_is_one = false;
            break;
        }
    }
    std::cout << (sum_is_one ? "\n" : "Rounding errors.\n");
}

int main()
{
    // For presentation purposes only
    std::cout << std::setprecision(60) << 0.2 << '\n';
    std::cout << std::setprecision(60) << 0.4 << '\n';
    std::cout << std::setprecision(60) << 0.6 << '\n';
    std::cout << std::setprecision(60) << 0.8 << "\n\n";

    auto k1 = make_knots(11, 3);
    for (auto i : k1)
    {
        std::cout << std::setprecision(60) << i << '\n';
    }
    verify(k1);

    auto k2 = make_knots<float>(10);
    for (auto i : k2)
    {
        std::cout << std::setprecision(60) << i << '\n';
    }
    verify(k2);
}

Testable here.

how to improve the precision of computing float numbers?

3 Answers3