Double in place of Float and Float rounding

Question

Edit: This question covers two topics:

The efficiency of using in double in place of float
Float precision following rounding

Is there any reason why I should not always use Java double instead of float?

I ask this question because this test code when using floats is failing and not clear why since the only difference is the use of float instead of double.

public class BigDecimalTest {
@Test public void testDeltaUsingDouble() { //test passes
    BigDecimal left = new BigDecimal("0.99").setScale(2,BigDecimal.ROUND_DOWN);
    BigDecimal right = new BigDecimal("0.979").setScale(2,BigDecimal.ROUND_DOWN);

    Assert.assertEquals(left.doubleValue(), right.doubleValue(), 0.09);
    Assert.assertEquals(left.doubleValue(), right.doubleValue(), 0.03);

    Assert.assertNotEquals(left.doubleValue(), right.doubleValue(), 0.02);
    Assert.assertNotEquals(left.doubleValue(), right.doubleValue(), 0.01);
    Assert.assertNotEquals(left.doubleValue(), right.doubleValue(), 0.0);
}
@Test public void testDeltaUsingFloat() {  //test fails on 'failing assert'

    BigDecimal left = new BigDecimal("0.99").setScale(2,BigDecimal.ROUND_DOWN);
    BigDecimal right = new BigDecimal("0.979").setScale(2,BigDecimal.ROUND_DOWN);

    Assert.assertEquals(left.floatValue(), right.floatValue(), 0.09);
    Assert.assertEquals(left.floatValue(), right.floatValue(), 0.03);

    /* failing assert */ Assert.assertNotEquals(left.floatValue() + " - " + right.floatValue() + " = " + (left.floatValue() - right.floatValue()),left.floatValue(), right.floatValue(), 0.02);
    Assert.assertNotEquals(left.floatValue(), right.floatValue(), 0.01);
    Assert.assertNotEquals(left.floatValue(), right.floatValue(), 0.0);
}}

Fail Message:

java.lang.AssertionError: 0.99 - 0.97 = 0.01999998. Actual: 0.9900000095367432
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failEquals(Assert.java:185)
at org.junit.Assert.assertNotEquals(Assert.java:230)
at com.icode.common.BigDecimalTest.testDeltaUsingFloat(BigDecimalTest.java:34)

Any idea why this test fails and why I shouldn't just always use double instead of float? of course a reason other than a double is wider than a float.

Edit: The funny things is that Assert.assertNotEquals(double,double,delta) takes double in both cases so the returned floats in the failing test are getting widened as doubles anyway so why the test failure then?

Edit: May be this other question is related, not sure though: hex not the same

Edit: From the answer to this question hex not the same it can be concluded that the scientific representation IEEE 754 for .99 for float is different from double for the same value. This is due the rounding.

Hence we get this:

0.99 - 0.97 = 0.01999998 //in float case
0.99 - 0.97 = 0.020000000000000018 //in double case

Since the max delta in the above unit test is 0.02 and 0.01999998 (in the failing test) is below the delta value meaning that the numbers are seen to be the same but the test is asserting they are not hence failing.

Guys do you agree with all this?

It seems there is a particularity with float substraction and rounding: http://stackoverflow.com/questions/13263650/float-number-is-not-the-expected-number-after-subtraction — Francis, Oct 25 '13 at 13:42

Eric Postpischil · Accepted Answer · 2013-10-26T22:22:33.013

The documentation for BigDecimal is silent about how floatValue() rounds. I presume it uses round-to-nearest, ties-to-even.

left and right are set to .99 and .97, respectively. When these are converted to double in round-to-nearest mode, the results are 0.9899999999999999911182158029987476766109466552734375 (in hexadecimal floating-point, 0x1.fae147ae147aep-1) and 0.9699999999999999733546474089962430298328399658203125 (0x1.f0a3d70a3d70ap-1). When those are subtracted, the result is 0.020000000000000017763568394002504646778106689453125, which clearly exceeds .02.

When .99 and .97 are converted to float, the results are 0.9900000095367431640625 (0x1.fae148p-1) and 0.9700000286102294921875 (0x1.f0a3d8p-1). When those are subtracted, the result is 0.019999980926513671875, which is clearly less than .02.

Simply put, when a decimal numeral is converted to floating-point, the rounding may be up or down. It depends on where the number happens to lie relative to the nearest representable floating-point values. If it is not controlled or analyzed, it is practically random. Thus, sometimes you end up with a greater value than you might have expected, and sometimes you end up with a lesser value.

Using double instead of float would not guarantee that results similar to the above do not occur. It is merely happenstance that the double value in this case exceeded the exact mathematical value and the float value did not. With other numbers, it could be the other way around. For example, with double, .09-.07 is less than .02, but, with float, .09f - .07f` is greater than .02.

There is a lot of information about how to deal with floating-point arithmetic, such as Handbook of Floating-Point Arithmetic. It is too large a subject to cover in Stack Overflow questions. There are university courses on it.

Often on today’s typical processors, there is little extra expense for using double rather than float; simple scalar floating-point operations are performed at nearly the same speeds for double and float. Performance differences arise when you have so much data that the time to transfer them (from disk to memory or memory to processor) becomes important, or the space they occupy on disk becomes large, or your software uses SIMD features of processors. (SIMD allows processors to perform the same operation on multiple pieces of data, in parallel. Current processors typically provide about twice the bandwidth for float SIMD operations as for double SIMD operations or do not provide double SIMD operations at all.)

score 1 · Answer 2 · edited May 23 '17 at 12:21

Double can represent numbers with a larger number of significant digits, with a greater range and vice versa for float. Double computations are more costly in terms of CPU. So it all depends on your application. Binary numbers cannot exactly represent a number such as 1/5. These numbers end up being rounded, thereby introducing errors that are certainty at the origin of you failed asserts. See http://en.m.wikipedia.org/wiki/Floating_point for more details.

[EDIT] If all else fails run a benchmark:

package doublefloat;

/**
 *
 * @author tarik
 */
public class DoubleFloat {

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) {
        // TODO code application logic here
        long t1 = System.nanoTime();
        double d = 0.0;
        for (long i=0; i<1000000000;i++) {
            d = d * 1.01;
        }
        long diff1 = System.nanoTime()-t1;
        System.out.println("Double ticks: " + diff1);

        t1 = System.nanoTime();
        float f = 0.0f;
        for (long i=0; i<1000000000;i++) {
            f = f * 1.01f;
        }
        long diff2 = System.nanoTime()-t1;
        System.out.println("Float  ticks: " + diff2);
        System.out.println("Difference %: " + (diff1 - diff2) * 100.0 / diff1);    
    }
}

Output:

Double ticks: 3694029247
Float  ticks: 3355071337
Difference %: 9.175831790592209

This test was ran on a PC with an Intel Core 2 Duo. Note that since we are only dealing with a single variable in a tight loop, there is no way to overwhelm the available memory bandwidth. In fact one of the core was consistently showing 100% CPU during each run. Conclusion: The difference is 9% which might be considered negligible indeed.

Second test involves the same test but using a relatively large amount of memory 140MB and 280MB for float and double respectively:

package doublefloat;

/**
 *
 * @author tarik
 */
public class DoubleFloat {

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) {
        final int LOOPS = 70000000;
        long t1 = System.nanoTime();
        double d[] = new double[LOOPS];
        d[0] = 1.0;
        for (int i=1; i<LOOPS;i++) {
            d[i] = d[i-1] * 1.01;
        }
        long diff1 = System.nanoTime()-t1;
        System.out.println("Double ticks: " + diff1);

        t1 = System.nanoTime();
        float f[] = new float[LOOPS];
        f[0] = 1.0f;
        for (int i=1; i<LOOPS;i++) {
            f[i] = f[i-1] * 1.01f;
        }
        long diff2 = System.nanoTime()-t1;
        System.out.println("Float  ticks: " + diff2);
        System.out.println("Difference %: " + (diff1 - diff2) * 100.0 / diff1);    
    }
}

Output:

Double ticks: 667919011
Float  ticks: 349700405
Difference %: 47.64329218950769

Memory bandwidth is overwhelmed, yet I can still see the CPU peaking at 100% for a short period of time.

Conclusion: This benchmark somewhat confirms that using double takes 9% more time that float on CPU intensive applications and about 50% more time in data intensive applications. It also confirms Eric Postpischil note, that CPU overhead is relatively negligible (9%) in comparison with the performance impact of limited memory bandwidth.

+1 regarding CPU usage, reading a long takes two operations, I think the same applies to double as it's also 64 bit. Thanks — jakstack, Oct 25 '13 at 13:24
Not sure it is the case on a 64 bit computer on a properly alligned memory location. — Tarik, Oct 25 '13 at 13:29
Seems rounding is only happening in float case and not in double but why is that since the numbers used is small enough for both data types, unless of course left - right = very large decimal that needs rounding. Which actually may be the answer as in the double case the data type is wide enough to hold result without rounding. — jakstack, Oct 25 '13 at 13:32
good point regarding 64 bit machine, so not sure how using double makes it more costly. Especially that Java is achieving very high performance testing benchmarks. — jakstack, Oct 25 '13 at 13:34
"In single-precision the latency is two clock cycles and in double-precision the latency is three clock cycles" See http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.57.4451 — Tarik, Oct 25 '13 at 19:34
@Tarik: That is a 16-year-old paper. Modern processors in the consumer market typically have fully pipelined floating-point units with little difference in latency between single- and double-precision. For scalar code, performance is similar. Performance differences typically manifest only when data is so voluminous that disk and memory transfers become important or when SIMD is used, since current SIMD features generally provide twice the bandwidth for single-precision operations that they do for double-precision. — Eric Postpischil, Oct 26 '13 at 21:53
@EricPostpischil Thanks for this precision. The bottom line is that optimization only matters on that part of the code that is so execution intensive that it hits one of the computing bottlenecks: CPU, cache, memory bandwidth, I/O...and we can get as technical as we want here. Accordingly, I was correct in that double computation is more expensive that float but arguably incorrect on which bottleneck would be hit first. — Tarik, Oct 27 '13 at 02:17
@EricPostpischil See updated answer that takes your remark into account. — Tarik, Oct 27 '13 at 11:11

Double in place of Float and Float rounding

2 Answers2

Linked