Should we generally use float literals for floats instead of the simpler double literals?

Question

In C++ _{(or maybe only our compilers VC8 and VC10)} 3.14 is a double literal and 3.14f is a float literal.

Now I have a colleague that stated:

We should use float-literals for float calculations and double-literals for double calculations as this could have an impact on the precision of a calculation when constants are used in a calcualtion.

Specifically, I think he meant:

double d1, d2;
float f1, f2;
... init and stuff ...
f1 = 3.1415  * f2;
f1 = 3.1415f * f2; // any difference?
d1 = 3.1415  * d2;
d1 = 3.1415f * d2; // any difference?

Or, added by me, even:

d1 = 42    * d2;
d1 = 42.0f * d2; // any difference?
d1 = 42.0  * d2; // any difference?

More generally, the only point I can see for using 2.71828183f is to make sure that the constant I'm trying to specify will actually fit into a float (compiler error/warning otherwise).

Can someone shed some light on this? Do you specify the f postfix? Why?

To quote from an answer what I implicitly took for granted:

If you're working with a float variable and a double literal the whole operation will be done as double and then converted back to float.

Could there possibly be any harm in this? (Other than a very, very theoretical performance impact?)

Further edit: It would be nice if answers containing technical details (appreciated!) could also include how these differences affect general purpose code. (Yes, if you're number crunching, you probably like to make sure your big-n floating point ops are as efficient (and correct) as possible -- but does it matter for general purpose code that's called a few times? Isn't it cleaner if the code just uses 0.0 and skips the -- hard to maintain! -- float suffix?)

An example is here: http://ideone.com/r8K2c – Robᵩ Oct 05 '11 at 14:03 — Robᵩ, Oct 05 '11 at 14:03
@Robᵩ Your link to ideone is broken. – Joseph Quinsey Sep 15 '18 at 21:09 — Joseph Quinsey, Sep 15 '18 at 21:09

score 56 · Accepted Answer · answered Oct 05 '11 at 14:21

56

Yes, you should use the f suffix. Reasons include:

Performance. When you write float foo(float x) { return x*3.14; }, you force the compiler to emit code that converts x to double, then does the multiplication, then converts the result back to single. If you add the f suffix, then both conversions are eliminated. On many platforms, each those conversions are about as expensive as the multiplication itself.
Performance (continued). There are platforms (most cellphones, for example), on which double-precision arithmetic is dramatically slower than single-precision. Even ignoring the conversion overhead (covered in 1.), every time you force a computation to be evaluated in double, you slow your program down. This is not just a "theoretical" issue.
Reduce your exposure to bugs. Consider the example float x = 1.2; if (x == 1.2) // something; Is something executed? No, it is not, because x holds 1.2 rounded to a float, but is being compared to the double-precision value 1.2. The two are not equal.

answered Oct 05 '11 at 14:21

Stephen Canon

103,815
19
183
269

Stephen - Note that one underlying reason I'm asking is that *maintaining* the float suffix is "impossible" (AFAIK) because our compiler doesn't warn. SO we could cough up this rule, but it couldn't be enforced and use would just be anecdotal and inconsistent. (Or so I fear.) Note: I doubt there are any critical code paths where a *literal* is actually used in a calculation here. – Martin Ba Oct 05 '11 at 14:36
1

Your current compiler may not do so, but many compilers *do* warn when a double literal can't be assigned to a float without loss of significance, such as in `float x = 0.3;`. That, for me, is reason enough to religiously use the `f` suffix. – Russell Borogove Feb 16 '12 at 20:18
1

@RussellBorogove: Do those same compilers warn about e.g. `double d=1f/10f`? I would posit that the vast majority of conversions from `double` to `float` will yield exactly the behavior the programmer would intend, while a very large fraction of conversions from `float` to `double` that involve non-whole-number values are wrong. – supercat Sep 04 '13 at 07:32
2

@supercat: `clang -Wconversion` yields `warning: implicit conversion loses floating-point precision: 'double' to 'float'` for double-to-float, and no diagnostic for float-to-double. The value `1.f/10.f` is unquestionably a `float` whatever the programmer may have intended, and its value can be exactly represented in a `double`, so there's just no lossage to warn about. – Russell Borogove Sep 04 '13 at 18:54
@RussellBorogove: What is the purpose of *having* warnings, if not to tell the programmer that code is likely to do something other than what was intended? Suppose one program contains `#define phi 1.6180339887498949`, and later on uses `phi` in some computations with `float` values and some with `double`. A second program is identical except for `#define phi 1.618034f`. Which program's behavior is more likely to match the programmer's intention? – supercat Sep 04 '13 at 19:16
I'm really not sure what you're getting at, but if the programmer is mixing float and double computations, she's going to get inconsistencies regardless of the precision of her constants, won't she? – Russell Borogove Sep 04 '13 at 20:00
Add overload resolution? – Deduplicator May 15 '14 at 12:46

Vaughn Cato · Answer 2 · 2016-03-16T13:03:26.657

I did a test.

I compiled this code:

float f1(float x) { return x*3.14; }            
float f2(float x) { return x*3.14F; }

Using gcc 4.5.1 for i686 with optimization -O2.

This was the assembly code generated for f1:

pushl   %ebp
movl    %esp, %ebp
subl    $4, %esp # Allocate 4 bytes on the stack
fldl    .LC0     # Load a double-precision floating point constant
fmuls   8(%ebp)  # Multiply by parameter
fstps   -4(%ebp) # Store single-precision result on the stack
flds    -4(%ebp) # Load single-precision result from the stack
leave
ret

And this is the assembly code generated for f2:

pushl   %ebp
flds    .LC2          # Load a single-precision floating point constant
movl    %esp, %ebp
fmuls   8(%ebp)       # Multiply by parameter
popl    %ebp
ret

So the interesting thing is that for f1, the compiler stored the value and re-loaded it just to make sure that the result was truncated to single-precision.

If we use the -ffast-math option, then this difference is significantly reduced:

pushl   %ebp
fldl    .LC0             # Load double-precision constant
movl    %esp, %ebp
fmuls   8(%ebp)          # multiply by parameter
popl    %ebp
ret


pushl   %ebp
flds    .LC2             # Load single-precision constant
movl    %esp, %ebp
fmuls   8(%ebp)          # multiply by parameter
popl    %ebp
ret

But there is still the difference between loading a single or double precision constant.

Update for 64-bit

These are the results with gcc 5.2.1 for x86-64 with optimization -O2:

f1:

cvtss2sd  %xmm0, %xmm0       # Convert arg to double precision
mulsd     .LC0(%rip), %xmm0  # Double-precision multiply
cvtsd2ss  %xmm0, %xmm0       # Convert to single-precision
ret

f2:

mulss     .LC2(%rip), %xmm0  # Single-precision multiply
ret

With -ffast-math, the results are the same.

Step into the modern era; (1) compile for x86_64 and (2) turn on SSE2 codegen. — Stephen Canon, Oct 05 '11 at 14:25
Nice info! But does it really matter for general purpose code? — Martin Ba, Oct 05 '11 at 14:31

score 9 · Answer 3 · answered Oct 05 '11 at 13:41

9

I suspect something like this: If you're working with a float variable and a double literal the whole operation will be done as double and then converted back to float.

If you use a float literal, notionally speaking the computation will be done at float precision even though some hardware will convert it to double anyway to do the calculation.

answered Oct 05 '11 at 13:41

Mark B

95,107
10
109
188

Or more. x86 uses 80-bit extended precision for floating-point registers. – Puppy Oct 05 '11 at 13:48
1

@DeadMG: Not for SIMD instructions ?! – smerlin Oct 05 '11 at 13:52
2

@DeadMG: x86 uses 80-bit extended *if you're on a system that codegens floating-point computation to x87 and doesn't set the precision in the control word*. More and more systems use SSE for floating-point computation, which avoids this detail entirely. Unless you specifically require the 80-bit format, new software should use SSE. – Stephen Canon Oct 05 '11 at 14:08

score 3 · Answer 4 · answered Oct 05 '11 at 14:05

3

Typically, I don't think it will make any difference, but it is worth pointing out that 3.1415f and 3.1415 are (typically) not equal. On the other hand, you don't normally do any calculations in float anyway, at least on the usual platforms. (double is just as fast, if not faster.) About the only time you should see float is when there are large arrays, and even then, all of the calculations will typically be done in double.

answered Oct 05 '11 at 14:05

James Kanze

150,581
18
184
329

2

There are plenty of platforms on which `float` is faster than `double`, including one the most ubiquitous computing platforms of all -- a typical smart phone. – Stephen Canon Oct 05 '11 at 14:11
James - do you mean to say that `3.1415f` converted to double is *not* equal to a double value initialized by the double literal `3.1415` (and vice versa)? Why would that be? – Martin Ba Oct 05 '11 at 14:12
6

@Martin Yes. The reason they don't compare equal is because they have different values. `3.1415` cannot be exactly represented in either `float` nor `double` (at least not on any of the usual platforms). `3.1415` is converted to the closest value representable in a `double`, and `3.1415f` is converted to the closest value representable in a `float`. Assigning the `3.1415` to a `float` will probably end up rounding to the same thing as `3.1415f`, but assigning `3.1415f` to a `double` will still result in the value exactly representable in a `float`. – James Kanze Oct 05 '11 at 14:23
4

@Martin Specifically, converting to float rounds off to the nearest [for this magnitude] 2^-23, which happens to result in 6588203*2^-21 (3.141499996185302734375), which is 8589935*2^-51 away from the double version which was rounded to the nearest 2^-51. – Random832 Oct 05 '11 at 15:46
@Random832 If the hardware uses IEEE floating point:-). (But I admire your courage to work out the exact numbers. I thought about it, but decided it would be too much work.) – James Kanze Oct 05 '11 at 16:04

score 1 · Answer 5 · answered Oct 05 '11 at 13:41

1

There is a difference: If you use a double constant and multiply it with a float variable, the variable is converted into double first, the calculation is done in double, and then the result is converted into float. While precision isn't really a problem here, this might lead to confusion.

answered Oct 05 '11 at 13:41

thiton

35,651
4
70
100

*How* might this lead to confusion? (Of whom?) – Martin Ba Oct 05 '11 at 13:43
1

I'm sorry that I have only far-fetched examples in my head, but this might lead to unexpected template specialization or overload choice, for example, confusing the developer who is not aware of the promotion. – thiton Oct 05 '11 at 13:47

score 1 · Answer 6 · answered Oct 05 '11 at 13:52

1

I personally tend to use the f postfix notation as a matter of principles and to make it obvious as much as I can that this is a float type rather than a double.

My two cents

answered Oct 05 '11 at 13:52

Martin

282
2
6

score 1 · Answer 7 · answered Oct 05 '11 at 13:57

From the C++ Standard ( Working Draft ), section 5 on binary operators

Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield result types in a similar way. The purpose is to yield a common type, which is also the type of the result. This pattern is called the usual arithmetic conversions, which are deﬁned as follows: — If either operand is of scoped enumeration type (7.2), no conversions are performed; if the other operand does not have the same type, the expression is ill-formed. — If either operand is of type long double, the other shall be converted to long double. — Otherwise, if either operand is double, the other shall be converted to double. — Otherwise, if either operand is float, the other shall be converted to float.

And also section 4.8

A prvalue of ﬂoating point type can be converted to a prvalue of another ﬂoating point type. If the source value can be exactly represented in the destination type, the result of the conversion is that exact representation. If the source value is between two adjacent destination values, the result of the conversion is an implementation-deﬁned choice of either of those values. Otherwise, the behavior is undeﬁned

The upshot of this is that you can avoid unnecessary conversions by specifying your constants in the precision dictated by the destination type, provided that you will not lose precision in the calculation by doing so (ie, your operands are exactly representable in the precision of the destination type ).

Yes, but will these *unnecessary conversions* do any harm? (I'm actually wondering whether they'll be even visible in the assembly.) — Martin Ba, Oct 05 '11 at 14:10
@Martin If they're visible in the assembly, then they do harm, and be it only the performance cost of the conversion. Of course we could always assume our compilers to do every optimization for us, we should help as much as possible and keeping the difference between single and double precision in mind is crucial to writing numerically stable code. If you care for the difference between single and double precision, then use the correct literals. If you don't care, then you can always use doubles everywhere, anyway. — Christian Rau, Oct 05 '11 at 14:26
@Christian: I'm really wondering if this would be just unnecessary micro-optimization. — Martin Ba, Oct 05 '11 at 14:38
@Martin Regarding performance, not everywhere as stated in many answers. And, more important, regarding correctness, see Stephen's and James' answers. But if you find using single precision floats over double precision micro-optimization (which might be true in many cases on standard x86 hardware), then using doubles everywhere spares you from thinking about single precision literals, anyway. — Christian Rau, Oct 05 '11 at 14:45
@Martin Rethorical counter-example: Is making only the neccessary member functions of a class virtual (instead of all) micro-optimization? Think of performance, correctness and style/cleanliness. — Christian Rau, Oct 05 '11 at 14:54

Should we generally use float literals for floats instead of the simpler double literals?

7 Answers7

Update for 64-bit

Linked

Related