23

I stumbled on a function that I think is unnecessary, and generally scares me:

float coerceToFloat(double x) {
    volatile float y = static_cast<float>(x);
    return y;
}

Which is then used like this:

// double x
double y = coerceToFloat(x);

Is this ever any different from just doing this?:

double y = static_cast<float>(x);

The intention seems to be to just strip the double down to single precision. It smells like something written out of extreme paranoia.

Ben
  • 9,184
  • 1
  • 43
  • 56
  • 1
    No there's no difference. As for the reasons, that's really not something we can speculate about (especially without any more context). You have to ask the original author for that. – Some programmer dude Nov 02 '18 at 12:54
  • 4
    I have no idea why the author of the code used a `volatile` variable. The function is no different from `float coerceToFloat(double x) { return static_cast(x); }` as far as I am aware. – NathanOliver Nov 02 '18 at 12:55
  • 1
    I mean, it's good practice to give this operation a name. `coerceToFloat` is certainly a lot more explicit about the intent than a plain static cast. The volatile... Hm. Maybe for debugging? – Max Langhof Nov 02 '18 at 12:56
  • @MaxLanghof it does, programming by guessing does not work – Slava Nov 02 '18 at 12:57
  • 6
    I may have found a bread crumb. [This](https://stackoverflow.com/a/3580429/4342498) says using `volatile` can break up floating point operations. Maybe the author used it for the same reason, it forces the compiler to truncate here, instead of optimizing it away and not spitting out an intermediate result. – NathanOliver Nov 02 '18 at 13:04
  • 1
    oh! @NathanOliver , then `volatile` is used to prevent optimization that might prevent the code to do what it was indented for! an amazing find! – scipsycho Nov 02 '18 at 13:06
  • I don't think the standard guarentees this [see a somewhat related question](https://stackoverflow.com/a/53076562/1708801) – Shafik Yaghmour Nov 02 '18 at 13:09
  • Yes, I think `volatile` is there to say "you must make this fit into an actual `float` before continuing". I was looking back at [an answer to a different question of mine](https://stackoverflow.com/a/49440753/874660) which claims "The C++ standard requires that excess precision be discarded in assignments and casts." I had remembered the "in assignments" part of that, which makes me think `volatile` isn't necessary (that just assigning a named float would suffice), but forgot the "and casts" part, which seems to answer this question. Although that didn't actually reference to the standard. – Ben Nov 02 '18 at 13:30
  • 1
    @thoron `volatile` means that the object must be represented as the ABI expects it to be, not how the optimizer would want to most efficiently represent it – curiousguy Nov 03 '18 at 00:32
  • @Ben Assignments and casts may disappear after compilation to intermediate code. The compiler's internal language probably doesn't have a concept of cast, as it isn't a semantic construct in normal programming (relaxed aka arbitrary/random floating point semantics is abnormal programming). Compiler writers often plainly refuse to implement part of language semantics they consider craycray, like the visible union punning rule in C. – curiousguy Nov 05 '18 at 02:18

4 Answers4

14

static_cast<float>(x) is required to remove any excess precision, producing a float. While the C++ standard generally permits implementations to retain excess floating-point precision in expressions, that precision must be removed by cast and assignment operators.

The license to use greater precision is in C++ draft N4659 clause 8, paragraph 13:

The values of the floating operands and the results of floating expressions may be represented in greater precision and range than that required by the type; the types are not changed thereby.64

Footnote 64 says:

The cast and assignment operators must still perform their specific conversions as described in 8.4, 8.2.9 and 8.18.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • 5
    I'm not sure that this is correct. Footnotes are non-normative. Maybe the intent of this footnote is to say that with a cast/assignment, one can change the type. But the excess precision can stay. It is a quality of implementation issue, so compilers likely drop the extra precision, but they don't required to do so. I'm not saying that this is the correct interpretation, but seems plausible to me. – geza Nov 02 '18 at 14:01
  • @geza: Footnotes are non-normative, but this behavior is well-known for C and C++; there are other Stack Overflow questions and answers discussing it. While the footnote is non-normative, it informs us that the proper interpretation of the normative text about cast operators is that they do actually perform the conversion they say they do, *i.e.*, a conversion from `double` to `float` does in fact produce a `float`. – Eric Postpischil Nov 02 '18 at 14:07
  • @EricPostpischil: I don't think the authors thought it would particularly matter whether the Standard technically allows behavior that was contrary to common expectations but might be useful for some purposes. If the Standard would forbid the unusual behavior, quality implementations intended for purposes that would benefit significantly from it should support both a conforming mode that behaves in the commonly-expected fashion and a non-conforming mode that behaves in the unusual one, but even if the Standard allows the behavior implementations should *still* offer the same two modes. – supercat Nov 02 '18 at 16:10
  • Unfortunately there are various cases where the compilers don't do that, especially when they start inlining stuff you can happen to end up with calculations being done with results in extended floating point registers that would even exceed long double precision (usually by one bit). Going through a volatile has proven for me the only thing that works reliably. – PlasmaHH Nov 02 '18 at 16:10
  • 1
    @PlasmaHH: Please show us a code sample with an implementation that does not remove the excess precision when a cast or assignment is performed along with the assembly generated by that implementation. – Eric Postpischil Nov 02 '18 at 17:01
11

Following up on the comment by @NathanOliver -- compilers are allowed to do floating-point math at higher precision than the types of the operands require. Typically on x86 that means that they do everything as 80-bit values, because that's the most efficient in the hardware. It's only when a value is stored that it has to be reverted to the actual precision of the type. And even then, most compilers by default will do optimizations that violate this rule, because forcing that change in precision slows down the floating-point operations. Most of the time that's okay, because the extra precision isn't harmful. If you're a stickler, you can use a command-line switch to force the compiler to honor that storage rule, and you might see that your floating-point calculations are significantly slower.

In that function, marking the variable volatile tells the compiler that it cannot elide storing that value; that, in turn, means that it has to reduce the precision of the incoming value to match the type that it's being stored in. So the hope is that this would force truncation.

And, no, writing a cast instead of calling that function is not the same, because the compiler (in its non-conforming mode) can skip the assignment to y if it determines that it can generate better code without storing the value, and it can skip the truncation as well. Keep in mind that the goal is to run floating-point calculations as fast as possible, and having to deal with niggling rules about reducing precision for intermediate values just slows things down.

In most cases, running flat-out by skipping intermediate truncations is what serious floating-point applications need. The rule requiring truncation on storage is more of a hope than a realistic requirement.

On a side note, Java originally required that all floating-point math be done at the exact precision required by the types involved. You can do that on Intel hardware by telling it not to extend fp types to 80 bits. This was met with loud complaints from number crunchers because that makes calculations much slower. Java soon changed to the notion of "strict" fp and "non-strict" fp, and serious number crunching uses non-strict, i.e., make it as fast as the hardware supports. People who thoroughly understand floating-point math (that does not include me) want speed, and know how to cope with the differences in precision that result.

Pete Becker
  • 74,985
  • 8
  • 76
  • 165
  • *People who thoroughly understand floating-point math* And that does not include me as well. Thanks for taking my blurb and turning into a coherent answer +1 – NathanOliver Nov 02 '18 at 13:29
  • “Typically on x86 that means that they do everything as 80-bit values, because that's the most efficient in the hardware” is dubious. Some compilers may have used the 80-bit floating-point registers in the past, but processor designs have moved on, and there are now disadvantages to using those old registers and their operations. – Eric Postpischil Nov 02 '18 at 13:34
  • @EricPostpischil I have yet to see an *conforming implementation*. A lot of time they will be nonconforming to perform better, with a flag that will do what the standard says, but at a cost to you. – NathanOliver Nov 02 '18 at 13:34
  • @NathanOliver: Please show us a code sample with an implementation that does not remove the excess precision when a cast or assignment is performed along with the assembly generated by that implementation. – Eric Postpischil Nov 02 '18 at 13:35
  • @EricPostpischil -- you're right that 80-bit stuff is neanderthal era. I keep forgetting that what I learned 20 years ago is not necessarily state-of-the-art. – Pete Becker Nov 02 '18 at 13:37
  • @EricPostpischil -- re: "A conforming C++ implementation may not ..." indeed. I think I said that quite clearly a couple of times. – Pete Becker Nov 02 '18 at 13:39
  • @PeteBecker: The C language was designed on the presumption that the type used for a floating-point argument was independent of the choice of floating -point types used within the expression. The only reason 80-bit values are "primeval" is that C89 didn't provide a means by which implementations could expose a longer-than-double type used for temporary expression evaluation without fundamentally breaking the langauge. – supercat Nov 02 '18 at 16:28
  • @PeteBecker: I think non-strict fp still requires rounding to `double` precision in most cases. The only place behavior is allowed to differ is with denormalized numbers. To use a decimal analogy if precision is 3 digits and the minimum exponent is -9, the precise value of multiplying 5.26E-9 times 2.0E-2 would be 1.052E-10, which should be rounded to 0.11E-9. Rounding to three digits without truncating the exponent, however, would yield 1.05E-10, which would then be rounded to 0.10E-9. Guaranteeing correct rounding in that case is hard on hardware... – supercat Nov 02 '18 at 16:52
  • ...that doesn't combine rounding and denormalization in the same step, and for most purposes would offer no benefit sufficient to justify the huge extra cost. – supercat Nov 02 '18 at 16:57
  • 1
    @supercat -- sigh; down the rabbit hole. Strict-fp in Java requires that calculations that involve two doubles be done at the precision of double. It does not allow doing the calculation at 80 bits and rounding the result to double; that can produce a different result. – Pete Becker Nov 02 '18 at 17:15
  • @PeteBecker: Strict-fp does require that, but *even non-strict fp* still requires such behavior in cases that don't involve denormals. – supercat Nov 02 '18 at 17:16
8

Some compilers have this concept of "extended precision", where doubles carry with them more than 64 bits of data. This results in floating point calculations that doesn't match the IEEE standard.

The above code could be an attempt to prevent extended precision flags on the compiler from removing the precision loss. Such flags explicitly violate the precision assumptions of doubles and floating point values. It seems plausible that they wouldn't do so on a volatile variable.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
6

Regardless of whether such a cast is allowed be optimized away, it does happen and the volatile assignment stops it from happening.

For example, MSVC compiling for 32bit (so using x87) with /Ox /fp:fast:

_x$ = 8                                       ; size = 8
float uselessCast(double) PROC                         ; uselessCast
        fld     QWORD PTR _x$[esp-4]
        ret     0
float uselessCast(double) ENDP                         ; uselessCast

_y$ = 8                                       ; size = 4
_x$ = 8                                       ; size = 8
float coerceToFloat(double) PROC                   ; coerceToFloat
        fld     QWORD PTR _x$[esp-4]
        fstp    DWORD PTR _y$[esp-4]
        fld     DWORD PTR _y$[esp-4]
        ret     0
float coerceToFloat(double) ENDP 

Where uselessCast is as below and coerceToFloat as in the question.

float uselessCast(double x)
{
    return static_cast<float>(x);
}

Similarly, GCC and Clang with -O3 -ffast-math -m32 -mfpmath=387

uselessCast(double):
    fld     QWORD PTR [esp+4]
    ret
coerceToFloat(double):
    sub     esp, 20
    fld     QWORD PTR [esp+24]
    fstp    DWORD PTR [esp+12]
    fld     DWORD PTR [esp+12]
    add     esp, 20
    ret

Godbolt link for all the above

Of course you may argue that with /fp:fast or -ffast-math you should not expect anything from floating point arithmetic anyway, but you may need it and yet still be able to discard excess precision.

harold
  • 61,398
  • 6
  • 86
  • 164
  • 2
    I could see usefulness for allowing programmers to allow compilers to waive truncation/rounding when storing values to register-qualified automatic objects. This would be true of small signed integer types as well as floating-point types (so that after `register int8_t foo=127; foo = foo+1;`, the value of `foo` could store either +128 or -128 at the compiler's leisure). Note that such an assignment *isn't* UB. A programmer's ability to use such waivers, however, would be undermined by having a compiler also apply them to things like casts whose sole purpose may have been to force truncation. – supercat Nov 02 '18 at 16:22