Another question that can only be honestly answered with "wrong question". Or at least: "Are you really willing to go there?". float
theoretically needs ca. 80% less die space (for the same number of cycles) and so can be much cheaper for bulk processing. GPUs love float
for this reason.
However, let's look at x86 (admittedly, you didn't say what architecture you're on, so I picked the most common). The price in die space has already been paid. You literally gain nothing by using float
for calculations. Actually, you may even lose throughput because additional extensions from float
to double
are required, and additional rounding to intermediate float
precision. In other words, you pay extra to have a less accurate result. This is typically something to avoid except maybe when you need maximum compatibility with some other program.
See Jens' comment as well. These options give the compiler permission to disregard some language rules to achieve higher performance. Needless to say this can sometimes backfire.
There are two scenarios where float
might be more efficient, on x86:
- GPU (including GPGPU), in fact many GPUs don't even support
double
and if they do, it's usually much slower. Yet, you will only notice when doing very many calculations of this sort.
- CPU SIMD aka vectorization
You'd know if you did GPGPU. Explicit vectorization by using compiler intrinsics is also a choice – one you could make, for sure, but this requires quite a cost-benefit analysis. Possibly your compiler is able to auto-vectorize some loops, but this is usually limited to "obvious" applications, such as where you multiply each number in a vector<float>
by another float
, and this case is not so obvious IMO. Even if you pow
each number in such a vector by the same int
, the compiler may not be smart enough to vectorize this effectively, especially if pow
resides in another translation unit, and without effective link time code generation.
If you are not ready to consider changing the whole structure of your program to allow effective use of SIMD (including GPGPU), and you're not on an architecture where float
is indeed much cheaper by default, I suggest you stick with double
by all means, and consider float
at best a storage format that may be useful to conserve RAM, or to improve cache locality (when you have a lot of them). Even then, measuring is an excellent idea.
That said, you could try ivaigult's algorithm (only with double
for the intermediate and for the result), which is related to a classical algorithm called Egyptian multiplication (and a variety of other names), only that the operands are multiplied and not added. I don't know how pow(double, double)
works exactly, but it is conceivable that this algorithm could be faster in some cases. Again, you should be OCD about benchmarking.