11

I have some blocks of code that do:

float total = <some float>;
double some_dbl = <some double>;

total *= some_dbl;

This elicits a compiler warning which I want to shut up, but I don't like turning off such warnings - instead, I would rather explicitly cast types as needed. Which got me thinking...is a (float)(total * some_dbl) more accurate than total * (float)some_dbl? Is it compiler or platform specific?

Better code example (linked below):

#include <iostream>
#include <iomanip>
#include <cmath>
using namespace std;

int main() {
    double d_total = 1.2345678;
    float f_total = (float)d_total;
    double some_dbl = 6.7809123;

    double actual = (d_total * some_dbl);
    float no_cast = (float)(f_total * some_dbl);
    float with_cast = (float)(f_total * (float)some_dbl);

    cout << "actual:               " << setprecision(25) << actual << endl;
    cout << "no_cast:              " << setprecision(25) << no_cast << endl;
    cout << "with_cast:            " << setprecision(25) << with_cast << endl;
    cout << "no_cast, nextafter:   " << setprecision(25) << nextafter(no_cast, 500.0f) << endl;

    cout << endl;

    cout << "Diff no_cast:   " << setprecision(25) << actual - no_cast << endl;
    cout << "Diff with_cast: " << setprecision(25) << with_cast - actual << endl;
    return 0;
}

Edit: So, I gave this a shot. With the examples I tried, I did find one quickly where total * (float)(some_dbl) appears to be more accurate. I assume this isn't going to always be the case, but is instead luck of the draw, or the compiler is truncating doubles to get to float, rather than rounding, causing potentially worse results. See: http://ideone.com/sRXj1z

Edit 2: I confirmed using std::nextafter that (float)(total * some_dbl) is returning the truncated value, and updated the linked code. It is quite surprising: if the compiler in this case is always truncating doubles, than you can say (float)some_dbl <= some_dbl, which then implies with_cast <= no_cast. However, this is not the case! with_cast is not only greater than no_cast, but it is closer to the actual value as well, which is kinda surprising, given that we are discarding information before the multiplication occurs.

Rollie
  • 4,391
  • 3
  • 33
  • 55
  • `(float)(total * some_dbl)` should be more accurate because, well, math... and that's true for every language. – No Idea For Name Nov 04 '14 at 06:58
  • gcc has a flag `-ffast-math` in which it is no longer bound by some of the the strictures that the standard places on floating point; it'd be interesting to see if that affects your result – M.M Nov 04 '14 at 07:41
  • Another "strange" thing you may find is that using `1.2345678f` in your source gives a different result to `(double)1.2345678` - the rounding at runtime may occur differently to the rounding at compile-time. – M.M Nov 04 '14 at 07:53
  • @MattMcNabb, updated a bit to try and handle; not sure what tricks the compiler is doing, so maybe playing with compile flags will change this behavior. – Rollie Nov 04 '14 at 07:57
  • Of course they're different. The latter version entails only single-precision arithmetic, while the former implies a promotion to double-precision of the left operand of the multiplication, followed by a demotion `(cast)`. This is perfectly in line with the C Standard's _6.3.1.8 Usual arithmetic conversions_, which states in part: _Otherwise, if the corresponding real type of either operand is double, the other operand is converted, without change of type domain, to a type whose corresponding real type is double._ And rounding being what it is, an over-precise result may round differently. – Iwillnotexist Idonotexist Nov 04 '14 at 12:00
  • It depends to your values what method is closer. Here is another example http://ideone.com/xC8gYA. – Patrick Nov 06 '14 at 12:56

4 Answers4

10

It will make a difference depending on the size of the numbers involved, because double is not just about more precision but can also hold numbers larger than float. Here's a sample that will show one such instance:

double d = FLT_MAX * 2.0;
float f = 1.0f / FLT_MAX;

printf("%f\n", d * f);
printf("%f\n", (float)d * f);
printf("%f\n", (float)(d * f));

And the output:

2.000000
inf
2.000000

This happens because while float can obviously hold the result of the computation -- 2.0, it can not hold the intermediate value of FLT_MAX * 2.0

Cory Nelson
  • 29,236
  • 5
  • 72
  • 110
  • This makes sense to me, but doesn't explain the behavior in the above with regards to rounding. Namely, why would `with_cast` be more accurate than `no_cast`? (see edit 2 above) – Rollie Nov 05 '14 at 03:00
2

If you do an operation then the compiler converts the variables into the biggest datatype of that operation. Here it is double. In my opinion the operation: (float)(var1f * var2) has more accuracy.

Patrick
  • 141
  • 6
1

I tested it and they aren't equal. The result of the below is true. http://codepad.org/3GytxbFK

#include <iostream>

using namespace std;

int main(){
  double a = 1.0/7;
  float b = 6.0f;
  float c = 6.0f;
  b = b * (float)a;
  c = (float)((double)c * a);
  cout << (b-c != 0.0f) << endl;
  return 0;
}

This leads me to reason: The cast from the result of the multiplication expressed as a double to a float will have a better chance to round. Some bits can fall off the end with the float multiplication that would have been correctly accounted for when the multiplication is carried out on doubles then casted to float.

BTW, I chose 1/7*6 because it repeats in binary.

Edit: Upon research, it seems the rounding should be the same for both conversion from double to float and for multiplication of floats, at least in an implementation conforming to IEEE 754. https://en.wikipedia.org/wiki/Floating_point#Rounding_modes

Tyler
  • 1,818
  • 2
  • 13
  • 22
  • you should also check `b-c` before you do the multiplications, and see which of those two results is closer to zero – M.M Nov 04 '14 at 07:21
  • @MattMcNabb This is what I did with my edit example; Tyler's logic is what I suspected as well, but I also thought it possible that mathematically there wouldn't be a difference. Clearly there is! – Rollie Nov 04 '14 at 07:22
1

Based on the figures from your code dump, two adjacent possible values of float are:

        d1 =  8.37149524...
        d2 =  8.37149620...

The result of doing the multiplication in double precision is:

              8.37149598...

which is in between those two, of course. Converting this result to float is implementation-defined as to whether it "rounds" up or down. In your code results, the conversion has selected d1, which is permitted, even though it is not the closest. The mixed-precision multiplication ended up with d2.

So we can conclude, somewhat unintuitively, that doing a calculation of doubles in double precision and then converting to float is in some cases less accurate than doing it entirely in float precision!

M.M
  • 138,810
  • 21
  • 208
  • 365