Why are double preferred over float?

Question

In most of the code I see around, double is favourite against float, even when a high precision is not needed.

Since there are performance penalties when using double types (CPU/GPU/memory/bus/cache/...), what is the reason of this double overuse?

Example: in computational fluid dynamics all the software I worked with uses doubles. In this case a high precision is useless (because of the errors due to the approximations in the mathematical model), and there is a huge amount of data to be moved around, which could be cut in half using floats.

The fact that today's computers are powerful is meaningless, because they are used to solve more and more complex problems.

"even when a high precision is not needed" - and if they had `float`s, you could complain that "even when a high performance is not needed"... — Karoly Horvath, Apr 02 '14 at 17:13
Because performance is the least concern of most code paths, and extra-precision cannot hurt (whereas the reverse is true ?) — Matthieu M., Apr 02 '14 at 17:13
Depending on the architecture, the hardware (x86 for example) might only implement `double` and simulate `float` by converting to `double` and back to `float`, making it more expensive. — David Rodríguez - dribeas, Apr 02 '14 at 17:13
@DavidRodríguez-dribeas: is that really the case? the conversion seems to be trivial. — Karoly Horvath, Apr 02 '14 at 17:14
Some recent CPUs use 512 bits registers to store floating point numbers... Given a double is 64 bits, which performance penalties are you referring to? — Macmade, Apr 02 '14 at 17:14
This is most likely subjective, but the performance penalties on CPU are much, much lower than the performance penalties on GPU (most CPU's have built-in double-floatingpoint units). Jumping from double to float just because it would be marginally faster is an example of premature optomization. Doubles accumulate rounding error much slower than singles, so you might as well go double and not worry about it. — IdeaHat, Apr 02 '14 at 17:14
Here is a stackoverflow discussion on double and float conversion http://stackoverflow.com/questions/16737615/how-is-actually-done-floating-point-conversion-double-to-float-or-float-to-doub — Richard Chambers, Apr 02 '14 at 17:16
@KarolyHorvath: From the point of view of the programmer it is trivial, as it is often done in hardware, but that does not mean that it does not take time. If your target platform operates on `double` even if the conversion was trivial there is no value in using `float`, since `double` is not less precise. There are cases (vectorizing instructions, cuda...) where this is not the case. But `double` is a sensible default in general — David Rodríguez - dribeas, Apr 02 '14 at 17:18
@DavidRodríguez-dribeas: re-read my question - now, you can drop your first and last sentence.. and well.. the rest as well. and the question is still: is that really the case (a fact) - is it really more expensive on the hardware or is that just your assumption? — Karoly Horvath, Apr 02 '14 at 17:21
@DavidRodríguez-dribeas I think I get what you refer to when you say that `double` is implemented but `float` is not, however it is not true. The old FPU instructions work with 80-bit double-extended numbers, which is neither `float` nor `double`, but it doesn't matter: it can load and save floats and doubles with no performance penalty (perhaps ironically, the instructions that load/save 80-bit floats *are* slow). On non-ancient x86 systems, both float and double are implemented directly with SSE(2). — harold, Apr 02 '14 at 17:25
And here is a stackoverflow discussion on the rules governing mixed float and double calculations http://stackoverflow.com/questions/4239770/what-are-the-rules-governing-c-single-and-double-precision-mixed-calculations — Richard Chambers, Apr 02 '14 at 17:25
One reason is probably that it is double and not float that is the "normal" floating-point type in C and C++: http://stackoverflow.com/a/1476038/15727 — Thomas Padron-McCarthy, Apr 02 '14 at 17:29
Don't take Wikipedia too seriously. Use [Agner Fog's instruction tables](http://www.agner.org/optimize/instruction_tables.pdf) to see that the statement is nonsense. — Hans Passant, Apr 02 '14 at 17:45
Just because the math model introduces larger errors, doesn't mean that the floating point error doesn't matter. Some types of problems have ill-conditioned matrices to be solved (including some geometries in CFD), where you can lose 10 digits in solving the matrix (at best case). Doing so in double doesn't flinch at this, doing so in float means that your 8 digits of saved round off just turned turned your answer into complete jibberish. — Godric Seer, Apr 02 '14 at 19:46
I usually use double because it is relatively easy to convince myself that the precision is sufficient. Showing the same thing for float takes more sophisticated reasoning, and more time. The case for using float is when there is a performance gain that justifies the extra work, and it is fact sufficiently precise. — Patricia Shanahan, Apr 03 '14 at 03:54

Deduplicator · Answer 1 · 2014-09-12T05:01:07.077

22

Among others:

The savings are hardly ever worth it (number-crunching is not typical).
Rounding errors accumulate, so better go to higher precision than needed from the start (experts may know it is precise enough anyway, and there are calculations which can be done exactly).
Common floating operations using the fpu internally often work on double or higher precision anyway.
C and C++ can implicitly convert from float to double, the other way needs an explicit cast.
Variadic and no-prototype functions always get double, not float. (second one is only in ancient C and actively discouraged)
You may commonly do an operation with more than needed precision, but seldom with less, so libraries generally favor higher precision too.

But in the end, YMMV: Measure, test, and decide for yourself and your specific situation.

BTW: There's even more for performance fanatics: Use the IEEE half precision type. Little hardware or compiler support for it exists, but it cuts your bandwidth requirements in half yet again.

edited Sep 12 '14 at 05:01

answered Apr 02 '14 at 17:18

Deduplicator

44,692
7
66
118

12

"The savings are hardly worth it" - For single computations (like holding the sum in a single variable) - Sure. For fetching a lot of data - No, you're doubling the bandwith. – Karoly Horvath Apr 02 '14 at 17:26
"Rounding errors accumulate" - In many cases rounding errors are negligible compared to errors due to other reasons (e.g. the mathematical model). – Pietro Apr 02 '14 at 17:28
@presiuslitelsnoflek - True, not always. But I would say in most cases. – Pietro Apr 02 '14 at 17:31
7

Yes, not bloating out your caches is another reason to use a smaller size. Yet another is a lot of SSE instructions have double and float versions, and the float versions operate on twice as much data in one instruction. (double your bandwidth double your fun) – Apriori Apr 02 '14 at 17:38
3

I'd just like to add that the precision of single-floats may be limiting far more often than one might naïvely think. In my experiences with OpenGL, I have several times had to remove bias from coordinates or choose a shorter modulus on time-periodic functions than I might have liked, simply because I ran out of precision in the 32-bit floats that are the mainstay of GPUs. – Dolda2000 Feb 19 '17 at 04:32
in my personal life history I used floats believing it'd be faster, I had rounding precisions problems due to shoving stuff to short coordinates without giving leeway (usually with +0.5 or round() function). Switched to double and almost all issues went away, with no performance difference. From that point I was all double. Then joined a company, everything was float "why?", for buffer compatibility reasons with GPU. Sure enough ran into big scales precision issues in DCCT that can edit planet-like scales. Circle of hell... – v.oddou May 22 '19 at 05:47

score 12 · Accepted Answer · answered Apr 02 '14 at 19:00

In my opinion the answers so far don't really get the right point across, so here's my crack at it.

The short answer is C++ developers use doubles over floats:

To avoid premature optimization when they don't understand the performance trade-offs well ("they have higher precision, why not?" Is the thought process)
Habit
Culture
To match library function signatures
To match simple-to-write floating point literals (you can write 0.0 instead of 0.0f)

It's true double may be as fast as a float for a single computation because most FPUs have a wider internal representation than either the 32-bit float or 64-bit double represent.

However that's only a small piece of the picture. Now-days operational optimizations don't mean anything if you're bottle necked on cache/memory bandwidth.

Here is why some developers seeking to optimize their code should look into using 32-bit floats over 64-bit doubles:

They fit in half the memory. Which is like having all your caches be twice as large. (big win!!!)
If you really care about performance you'll use SSE instructions. SSE instructions that operate on floating point values have different instructions for 32-bit and 64-bit floating point representations. The 32-bit versions can fit 4 values in the 128-bit register operands, but the 64-bit versions can only fit 2 values. In this scenario you can likely double your FLOPS by using floats over double because each instruction operates on twice as much data.

In general, there is a real lack of knowledge of how floating point numbers really work in the majority of developers I've encountered. So I'm not really surprised most developers blindly use double.

Kaz · Answer 3 · 2014-04-02T19:34:04.903

double is, in some ways, the "natural" floating point type in the C language, which also influences C++. Consider that:

an unadorned, ordinary floating-point constant like 13.9 has type double. To make it float, we have to add an extra suffix f or F.
default argument promotion in C converts float function arguments^* to double: this takes place when no declaration exists for an argument, such as when a function is declared as variadic (e.g. printf) or no declaration exists (old style C, not permitted in C++).
The %f conversion specifier of printf takes a double argument, not float. There is no dedicated way to print float-s; a float argument default-promotes to double and so matches %f.

On modern hardware, float and double are usually mapped, respectively, to 32 bit and 64 bit IEEE 754 types. The hardware works with the 64 bit values "natively": the floating-point registers are 64 bits wide, and the operations are built around the more precise type (or internally may be even more precise than that). Since double is mapped to that type, it is the "natural" floating-point type.

The precision of float is poor for any serious numerical work, and the reduced range could be a problem also. The IEEE 32 bit type has only 23 bits of mantissa (8 bits are consumed by the exponent field and one bit for the sign). The float type is useful for saving storage in large arrays of floating-point values provided that the loss of precision and range isn't a problem in the given application. For example, 32 bit floating-point values are sometimes used in audio for representing samples.

It is true that the use of a 64 bit type over 32 bit type doubles the raw memory bandwidth. However, that only affects programs which with a large arrays of data, which are accessed in a pattern that shows poor locality. The superior precision of the 64 bit floating-point type trumps issues of optimization. Quality of numerical results is more important than shaving cycles off the running time, in accordance with the principle of "get it right first, then make it fast".

* Note, however, that there is no general automatic promotion from float expressions to double; the only promotion of that kind is integral promotion: char, short and bitfields going to int.

This statement is a bit problematic: "The hardware works with the 64 bit values "natively"". SSE/AVX registers are 128/256 bit wide and can pack floats and doubles, so both formats are equally native to hardware. — void_ptr, Feb 09 '15 at 17:42

score 7 · Answer 4 · edited Apr 02 '14 at 17:35

7

This is mostly hardware dependent, but consider that the most common CPU (x86/x87 based) have internal FPU that operate on 80bits floating point precision (which exceeds both floats and doubles).

If you have to store in memory some intermediate calculations, double is the good average from internal precision and external space. Performance is more or less the same, on single values. It may be affected by the memory bandwidth on large numeric pipes (since they will have double length).

Consider that floats have a precision that approximate 6 decimal digits. On a N-cubed complexity problem (like a matrix inversion or transformation), you lose two or three more in mul and div, remaining with just 3 meaningful digits. On a 1920 pixel wide display they are simply not enough (you need at least 5 to match a pixel properly).

This roughly makes double to be preferable.

edited Apr 02 '14 at 17:35

user703016

37,307
8
87
112

answered Apr 02 '14 at 17:33

Emilio Garavaglia

20,229
2
46
63

I agree, but there are problems where precision is not critical, and what is important is data size and transfer speed (e.g. problems where the solution is stable). – Pietro Apr 02 '14 at 17:43
What is the purpose of a comment that says "I agree ... BUT" (and BUT negates the agree) and then adds things I also wrote about? – Emilio Garavaglia Apr 03 '14 at 06:21
Let's say there are two classes of problems. 1) One where precision is critical, and 2) one where the computation time is critical. Your answer applies to the first class, and in this regard I agree with you. An example of a problem of the second class could be the weather forecasts: if the time taken to compute a certain period is longer that the period itself, the forecast is useless (I would get a forecast for yesterday). This is the case where the "BUT" applies. – Pietro Apr 03 '14 at 17:39

score 4 · Answer 5 · answered Apr 02 '14 at 18:19

It is often relatively easy to determine that double is sufficient, even in cases where it would take significant numerical analysis effort to show that float is sufficient. That saves development cost, and the risk of incorrect results if the analysis is not done correctly.

Also any performance gain by using float is usually relatively slighter than using double,that is because most of the popular processors do all floating point arithmetic in one format that is even wider than double.

score 3 · Answer 6 · edited Apr 02 '14 at 17:38

3

I think higher precision is the only reason. Actually most people don't think a lot about it, they just use double.

I think if float precision is good enough for particular task there is no reason to use double.

edited Apr 02 '14 at 17:38

answered Apr 02 '14 at 17:18

Sandro

2,707
2
20
30

I think you're absolutely right. Most developers really don't seem to think about (or for that matter understand) floating representation details and their implications very much. The reason so many developers use doubles is probably pretty similar to the reason you see implicit int/float float/int conversions floating around everywhere; lack of understanding more than necessity. I think the answer is "it's culture." I however think the thought in this answer could be a little more flushed out and more concrete details provided to be a viable answer to choose/accept. – Apriori Apr 02 '14 at 17:48
1

IME, developers tend to blindly use `float`, and suffer from precision problems as a result. – dan04 Apr 02 '14 at 19:41

Why are double preferred over float?

6 Answers6

Linked