6

Is there a difference between this (using floating point literal suffixes):

float MY_FLOAT = 3.14159265358979323846264338328f; // f suffix
double MY_DOUBLE = 3.14159265358979323846264338328; // no suffix 
long double MY_LONG_DOUBLE = 3.14159265358979323846264338328L; // L suffix

vs this (using floating point casts):

float MY_FLOAT = (float)3.14159265358979323846264338328;
double MY_DOUBLE = (double)3.14159265358979323846264338328;
long double MY_LONG_DOUBLE = (long double)3.14159265358979323846264338328;

in C and C++?

Note: the same would go for function calls:

void my_func(long double value);

my_func(3.14159265358979323846264338328L);
// vs
my_func((long double)3.14159265358979323846264338328);
// etc.

Related:

  1. What's the C++ suffix for long double literals?
  2. https://en.cppreference.com/w/cpp/language/floating_literal
Gabriel Staples
  • 36,492
  • 15
  • 194
  • 265

3 Answers3

5

The default is double. Assuming IEEE754 floating point, double is a strict superset of float, and thus you will never lose precision by not specifying f. EDIT: this is only true when specifying values that can be represented by float. If rounding occurs this might not be strictly true due to having rounding twice, see Eric Postpischil's answer. So you should also use the f suffix for floats.


This example is also problematic:

long double MY_LONG_DOUBLE = (long double)3.14159265358979323846264338328;

This first gives a double constant which is then converted to long double. But because you started with a double you have already lost precision that will never come back. Therefore, if you want to use full precision in long double constants you must use the L suffix:

long double MY_LONG_DOUBLE = 3.14159265358979323846264338328L; // L suffix
orlp
  • 112,504
  • 36
  • 218
  • 315
  • Interesting. I had no idea. I always thought the compiler was smart enough to know to just treat `(long double)3.14159265358979323846264338328` equivalently as though I had typed `3.14159265358979323846264338328L`. – Gabriel Staples Dec 04 '20 at 03:19
  • 1
    @GabrielStaples See also [this question](https://stackoverflow.com/questions/37606564/what-is-the-default-data-type-of-number-in-c). Notably large `unsigned long long` constants need the `ULL` suffix or they will overflow. – orlp Dec 04 '20 at 03:21
  • 2
    @GabrielStaples, That does complicate the compiler. Instead of having a long double literal just from lexing, it would now have to parse and special-case the cast, too. That's probably one fair reason why it doesn't work like that. – chris Dec 04 '20 at 03:23
  • 1
    In C and C++ (not just IEEE), the set of values a `float` can represent is a strict subset of what a `double` can represent, and the set of values a `double` can represent is a strict subset of what a `long double` can represent. – Peter Dec 04 '20 at 03:38
  • 2
    Re "double is a strict superset of float, and thus you will never lose precision by not specifying f". That is not necessarily so because of potential *double-rounding* issues: A decimal floating-point literal first rounded to binary double precision with the result then rounded to binary single precision could produce a different result from same decimal floating-point literal with `f` suffix rounded to binary single precision right away. – njuffa Dec 04 '20 at 04:33
  • @njuffa, can you provide an example in code? That's hard to follow. – Gabriel Staples Dec 04 '20 at 04:52
  • 1
    @GabrielStaples Rick Regan has an excellent write-up in his blog "Exploring Binary": https://www.exploringbinary.com/double-rounding-errors-in-decimal-to-double-to-float-conversions/ – njuffa Dec 04 '20 at 05:05
  • 2
    @GabrielStaples: double rounding can always cause a trouble. Consider 1.49. If you round it to the 0.1, then to integers, this happens: 1.49 -> 1.5 -> 2. But if you round it to an integer straight away, this happens: 1.49 -> 1. As you can see, the result is different. So this answer is not entirely correct as it ignores this fact. See Eric's answer for a more complete description. So the complete answer is this and Eric's answer combined. – geza Dec 04 '20 at 11:53
  • @EricPostpischil Sorry, I have no idea what you are trying to point out to me or what question you may be asking. I pointed to Rick Regan's blog since asker requested *a* concrete example of double-rounding issues in conversion from decimal floating-point literals and I did not have one handy. Generally speaking, I have found Regan's blog to offer quite reliable information so I don't think I provided a disservice by pointing to it. – njuffa Dec 06 '20 at 05:00
  • @EricPostpischil I naturally assumed that the fewer the number of decimal places in the floating-point literal, the fewer the number of discrepancies due to double-rounding there are to be found. In fact, what Regan states seems to confirm this intution: "Having tested all numbers with 9 digits or fewer, I found only one 7-digit number, nine 8-digit numbers, and 51 9-digit numbers." – njuffa Dec 06 '20 at 08:19
  • @njuffa: Ah, I just went hunting on my own, partly because figure out ways to search is interesting in itself. I did not actually look at Regan’s article until later. Never mind. – Eric Postpischil Dec 06 '20 at 13:40
5

There is a difference between using a suffix and a cast; 8388608.5000000009f and (float) 8388608.5000000009 have different values in common C implementations. This code:

#include <stdio.h>

int main(void)
{
    float x =         8388608.5000000009f;
    float y = (float) 8388608.5000000009;
    printf("%.9g - %.9g = %.9g.\n", x, y, x-y);
}

prints “8388609 - 8388608 = 1.” in Apple Clang 11.0 and other implementations that use correct rounding with IEEE-754 binary32 for float and binary64 for double. (The C standard permits implementations to use methods other than IEEE-754 correct rounding, so other C implementations may have different results.)

The reason is that (float) 8388608.5000000009 contains two rounding operations. With the suffix, 8388608.5000000009f is converted directly to float, so the portion that must be discarded in order to fit in a float, .5000000009, is directly examined in order to see whether it is greater than .5 or not. It is, so the result is rounded up to the next representable value, 8388609.

Without the suffix, 8388608.5000000009 is first converted to double. When the portion that must be discarded, .0000000009, is considered, it is found to be less than ½ the low bit at the point of truncation. (The value of the low bit there is .00000000186264514923095703125, and half of it is .000000000931322574615478515625.) So the result is rounded down, and we have 8388608.5 as a double. When the cast rounds this to float, the portion that must be discarded is .5, which is exactly halfway between the representable numbers 8388608 and 8388609. The rule for breaking ties rounds it to the value with the even low bit, 8388608.

(Another example is “7.038531e-26”; (float) 7.038531e-26 is not equal to 7.038531e-26f. This is the only such numeral with fewer than eight significant digits when float is binary32 and double is binary64, except of course “-7.038531e-26”.)

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • Very good point, I have edited my answer linking to this one. I was only thinking about representable values, forgetting about the double rounding. – orlp Dec 04 '20 at 12:46
0

While you do not lose precision if you omit the f in a float constant, there can be surprises in so doing. Consider this:

#include    <stdio.h>

#define DCN 0.1
#define FCN 0.1f
int main( void)
{
float   f = DCN;
    printf( "DCN\t%s\n", f > DCN ? "more" : "not-more");
float   g = FCN;
    printf( "FCN\t%s\n", g > FCN ? "more" : "not-more");
    return 0;
}

This (compiled with gcc 9.1.1) produces the output

DCN more
FCN not-more

The explanation is that in f > DCN the compiler takes DCN to have type double and so promotes f to a double, and

(double)(float)0.1 > 0.1

Personally on the (rare) occasions when I need float constants, I always use a 'f' suffix.

dmuir
  • 4,211
  • 2
  • 14
  • 12