How to compute total floating-point rounding error of a set of arithmetic computations in java using Math.ulp(double)?

Question

I want to calculate the floating-point rounding error of a series of additions, multiplications, and divisions using the Math.ulp(double) method in Java. According to the wiki page on Unit in the Last place (ULP), it seems that the error from one floating-point calculation, say 2+3 or 2*3 would be 0.5*ulp(2+3) or 0.5*ulp(2*3), respectively, where 2*3 and 2+3 are the floating-point calculations. However, adding up these errors does not account for the actual error I get in the end product. Saying the maximum error, for example, of 2+3*4 = 0.5*ulp(2+[3*4]) + 0.5*ulp(3*4) does not seem to account for the actual error I get. Thus, I am confused, and perhaps I am misunderstanding Math.ulp(double) or maybe I need to use some kind of relative error. I don't know. Could anyone explain this to me and maybe give a few examples of addition, multiplication, and division with floating-point and exact numbers? It would be much appreciated.

I'm attempting to calculate the reduced row echelon form of a matrix for a Matrix class, and I need to know if, after a few calculations, certain items in the two dimensional array I'm using for the calculation are equal to 0. If a row is all zeroes, I exit the code. If it has a nonzero number in it, I divide that number by itself and then perform Gaussian elimination. The problem is that after performing a series of operations, floating-point error can creep in and calculations that should result in a zero end up as a nonzero number which then messes up my matrix calculation. Thus, I am trying to change the condition under which Gaussian elimination occurs from zero to less than a calculated error bound, and I am calculating the error bound for every item in the matrix based on the calculations done to that item, added together in a new error array. Here is my code:

/**
 * Finds the reduced row echelon form of the matrix using partial pivoting
 * @return rref: The reduced row echelon form of the matrix
 */
public Matrix rref()
{
    //ref()
    Matrix ref = copy();
    int iPivot = 0, jPivot = 0, greatestPivotRow;
    double[][] errorArray = new double[height][width];
    while(iPivot < height && jPivot < width)
    {
        do
        {
            //Finds row with greatest absolute-value-of-a-number at the horizontal value of the pivot position
            greatestPivotRow = iPivot;
            for(int n = iPivot; n < height; n++)
            {
                if(Math.abs(ref.getVal(n, jPivot)) > Math.abs(ref.getVal(greatestPivotRow, jPivot)))
                    greatestPivotRow = n;
            }
            //Swaps row at pivot with that row if that number is not 0 (Or less than the floating-point error)
            //If the largest number is 0, all numbers below in the column are 0, so jPivot increments and row swapper is repeated
            if(Math.abs(ref.getVal(greatestPivotRow, jPivot)) > errorArray[greatestPivotRow][jPivot])
                ref = ref.swapRows(iPivot, greatestPivotRow);
            else
                jPivot++;
        }
        while(jPivot < width && Math.abs(ref.getVal(greatestPivotRow, jPivot)) <= errorArray[greatestPivotRow][jPivot]); 
        if(jPivot < width)
        {
            //Pivot value becomes 1
            double rowMultiplier1 = 1/ref.getVal(iPivot,jPivot);
            for(int j = jPivot; j < width; j++)
            {
                ref.matrixArray[iPivot][j] = ref.getVal(iPivot,j) * rowMultiplier1;
                errorArray[iPivot][j] += 0.5 * (Math.ulp(ref.matrixArray[iPivot][j]) + Math.ulp(rowMultiplier1));
            }
            //1st value in nth row becomes 0
            for(int iTarget = iPivot + 1; iTarget < height; iTarget++)
            {
                double rowMultiplier0 = -ref.getVal(iTarget, jPivot)/ref.getVal(iPivot, jPivot);
                for(int j = jPivot; j < width; j++)
                {
                    errorArray[iTarget][j] += 0.5 * (Math.ulp(ref.getVal(iPivot, j) * rowMultiplier0) + Math.ulp(ref.getVal(iTarget, j)
                            + ref.getVal(iPivot, j)*rowMultiplier0) + Math.ulp(rowMultiplier0));
                    ref.matrixArray[iTarget][j] = ref.getVal(iTarget, j)
                            + ref.getVal(iPivot, j)*rowMultiplier0;
                }
            }
        }
        //Shifts pivot down 1 and to the right 1
        iPivot++;
        jPivot++;
    }

    //rref
    Matrix rref = ref.copy();
    iPivot = 1;
    jPivot = 1;
    //Moves pivot along the diagonal
    while(iPivot < height && jPivot < width)
    {
        //Moves horizontal position of pivot to first nonzero number in the row (the 1)
        int m = jPivot;
        while(m < width && Math.abs(rref.getVal(iPivot, m)) < errorArray[iPivot][m])
            m++;
        if(m != width)
        {
            jPivot = m;
            //1st value in rows above pivot become 0
            for(int iTarget = 0; iTarget < iPivot; iTarget++)
            {
                double rowMultiplier = -rref.getVal(iTarget, jPivot)/rref.getVal(iPivot, jPivot);
                for(int j = jPivot; j < width; j++)
                {
                    errorArray[iTarget][j] += 0.5 * (Math.ulp(rref.getVal(iTarget, j) * rowMultiplier) + Math.ulp(rref.getVal(iTarget, j)
                            + rref.getVal(iPivot, j)*rowMultiplier) + Math.ulp(rowMultiplier));
                    rref.matrixArray[iTarget][j] = rref.getVal(iTarget, j)
                            + rref.getVal(iPivot, j)*rowMultiplier;
                }
            }
        }
        iPivot++;
        jPivot++;
    }
    //Get rid of floating-point errors in integers
    for(int i = 0; i < height; i++)
    {
        for(int j =0; j < width; j++)
        {
            if(Math.abs(rref.getVal(i, j) - (int)(rref.getVal(i, j) + 0.5)) <= errorArray[i][j])
                rref.matrixArray[i][j] = (int)(rref.getVal(i, j) + 0.5);
        }
    }
    return rref;
}

The last part of the code, converting floating-point numbers less than the calculated error away from an integer value to that integer value is mostly to tell me if my error formula has worked, since some of the matrices I am calculating end up with, instead of integers, 5.000000000000004s and the like. Therefore, I know if I have a number very close to an integer but not the integer, I also know that my error bounds are not large enough, and apparently they are not, so I think I am doing something incorrectly.

My input matrix was one with the instance variable

double[][] matrixArray = {{1,-2,0,0,3}, {2,-5,-3,-2,6}, {0,5,15,10,0}, {2,6,18,8,6}};

And my result was the array

[[1.0, 0.0, 0.0, -2.0000000000000013, 3.0], [0.0, 1.0, 0.0, -1.0000000000000004, 0.0], [0.0, 0.0, 1.0, 1.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0]]

Though my error calculations fixed the problem with zeroes being turned into ones and then used in Gaussian elimination, I still have numbers that are not integers, so I know my error bounds are inaccurate. It may have worked in this case, but might not in the next without the correct error bounds.

Are you trying to calculate the _exact_ error, not just a bound on the error? There's not likely to be a simple formula for that, in terms of ulp or anything. (In any event, +, -, and * on integers with results < 2^52 will have no error at all.) — Louis Wasserman, Dec 26 '15 at 00:10
then what's wrong with the formulas you're using? The actual error in those examples will be less than the bound you're calculating. — Louis Wasserman, Dec 26 '15 at 00:28
This link has a good explanation of what ULP and relative error are all about. http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html — jrook, Dec 26 '15 at 00:56
@Louis Well, my actual error is still greater than my predicted error, so I'm assuming there is something wrong with my error calculation. — abeta201, Dec 26 '15 at 01:05
How do you know what the actual error is? (The actual error for 2 + 3 is 0.) — Louis Wasserman, Dec 26 '15 at 01:08
@jrook I understand what ULP is about- I have read a large part of the 90-some pages in that link, but I am asking how to apply the ulp to calculation of TOTAL error bound over some number of calculations. Do I add relative error? Do the +,-,*, and / operations have 0.5 ulps of error in the JVM? If I do 1 / [some floating point number], is that different from [one floatingpoint number] / [another]? — abeta201, Dec 26 '15 at 01:11
@Louis In a test case, I know that my answer should be an integer, not [that integer] + .0000000000000004, so my answer is off by 4e-16, the actual error. — abeta201, Dec 26 '15 at 01:13
Determining (tight) error bounds for sequences of floating-point operations is very much a non-trivial process, and entire books have been written to address various aspects of this issue, from J. H. Wilkinson's "Rounding Errors in Algebraic Processes" (1965) to Nicholas J. Higham's "Accuracy and Stability of Numerical Algorithms 2nd ed." (2002). I therefore feel the question is too broad, and only tangentially related to programming, but will refrain from a close vote for now (maybe someone can squeeze an answer into the typical SO answer format). — njuffa, Dec 26 '15 at 01:56
@abeta201 As far as I understand, If you want to determine your error compared to the exact value, you need to determine your radix, how many digits of precision you are using for your calculations, and the expression you are trying to evaluate. Without this information, it can only be said that IEEE 754-64 bit is precise up to 52 binary digits. Math.ulp gives the distance to the next double. The table in this link is what Math.ulp gives : http://www.exploringbinary.com/the-spacing-of-binary-floating-point-numbers/ — jrook, Dec 26 '15 at 02:31
@njuffa Good thing I don't care about extra-tight error bounds. Also good I'm not using transcendental functions. Simple addition, multiplication, subtraction, and division, according to Wikipedia (https://en.wikipedia.org/wiki/Unit_in_the_last_place), usually have an error bound of no more than 0.5*ulp([floating-point solution]) I think. Which is why I'm wondering why my program does not calculate enough error. — abeta201, Dec 26 '15 at 03:26
@abeta201 To make your question more concrete, you might want to show the complete code, list the possible range of each of the inputs, and state your anticipated as well as observed bounds. — njuffa, Dec 26 '15 at 07:33
Thank you all for your help with this. I believe I found the issue in my code. Though it would still be nice to have someone confirm that I was correctly calculating error bounds correctly, my actual error is now less than my predicted error. Thanks! — abeta201, Dec 26 '15 at 17:45
@abeta201 : I can't understand your error formula. Can you please explain how you came up with it? (Or give a reference to study). The way I understand Math.ulp(), I think the last loop in your code would be the perfect place to use it to assert a floating point calculation has been good enough: assertEquals(expectedValue, actualValue, Math.ulp(expectedValue)); http://www.ibm.com/developerworks/library/j-math2/ — jrook, Dec 26 '15 at 19:00

Pascal Cuoq · Answer 1 · 2015-12-27T20:06:18.107

2+3*4 = 0.5*ulp(2+[3*4]) + 0.5*ulp(3*4)

Errors compound. Like interest, the final error can grow exponentially. The operations in your example are exact, so it's difficult to see what you are complaining about (surely you did get exactly 14?). Are you taking into account the representation error that cause the constants involved in your computations not to be the mathematical values but also a 0.5ULP approximation of them?

In addition to the exponential growth of error when computed statically with the necessary precision, there is the problem that you are using inaccurate floating-point math to compute errors:

errorArray[iTarget][j] += 0.5 * (Math.ulp(rref.getVal(iTarget, j) * rowMultiplier) + Math.ulp(rref.getVal(iTarget, j)

The actual error can be more than computed by this statement because nothing prevents the floating-point addition from being a lower approximation of the mathematical result (the multiplications happen to be likely exact because one of the multiplicand is a power of two in each case).

In another programming language, you could change the rounding mode to “upwards” for this computation, but Java does not provide access to this functionality.

Here are a bunch of tangentially relevant remarks:

When the mathematically expected result is an integer, the usual way to obtain a double that is that integer is to ensure a 1ULP error for the entire computation. You almost never get a 1ULP bound for a computation that involves more than a couple of operations, unless you take special steps to ensure this bound (for instance Dekker multiplication).

Java can use constants and print results in the hexadecimal format, and you should use that if you want to see what is happening exactly.

If you are interested in obtaining an upper bound on the final error along a specific computation, as opposed to statically for all computations, then interval arithmetic is slightly more accurate than characterization of error as a single absolute value, and requires much less thinking. In a context where you know by other means that the result has to be an integer, if the resulting interval contains only one integer, you would know for sure that this is the only possible answer.

score 0 · Answer 2 · edited May 23 '17 at 12:23

If you are interested in calculating error bounds for Gaussian elimination process, that's a very complicated issue. For example, this paper gives a formula on upper bound of error: Higham NJ, Higham DJ. Large Growth Factors in Gaussian Elimination with Pivoting. SIAM Journal on Matrix Analysis and Applications. 1989;10(2):155.

The formula is:

Which is by no means simple!

On the other hand, if your goal is to prevent creeping floating point errors from ruining your zeros, I don't think you even need to create errorArray[][]. You will do fine by calculating in floating point and then setting a condition for precision with the help of Math.ulp() or machine epsilon. In this way, you won't need a final loop to "get rid" of those pesky zeros in the end!

You can also use java's BigDecimal and see if you get better results. Maybe this question and its given answers can help.

I don't know if I want to go into infinite matrix norms. I would simply like to know whether my maximum error calculations for a series of additions, multiplications, or divisions were vaguely accurate. I suppose it is the "condition for precision" you mentioned that I need an example for. I have no experience using Math.ulp. The end loop, though, was just to make sure my function worked- the pesky zeroes I avoided in my conditional statements. (Though that end loop did make my results much prettier, I have to say). I got my code working, but that doesn't mean I got my error analysis correct. — abeta201, Dec 29 '15 at 07:16

How to compute total floating-point rounding error of a set of arithmetic computations in java using Math.ulp(double)?

2 Answers2