How to force pow(float, int) to return float

Question

The overloaded function float pow(float base, int iexp ) was removed in C++11 and now pow returns a double. In my program, I am computing lots of these (in single precision) and I am interested in the most efficient way how to do it.

Is there some special function (in standard libraries or any other) with the above signature?

If not, is it better (in terms of performance in single precision) to explicitly cast result of pow into float before any other operations (which would cast everything else into double) or cast iexp into float and use overloaded function float pow(float base, float exp)?

EDIT: Why I need float and do not use double?

The primarily reason is RAM -- I need tens or hundreds of GB so this reduction is huge advantage. So I need from float to get float. And now I need the most efficient way to achieve that (less casts, use already optimize algorithms, etc).

Have you measured this as a performance problem? Depending on target, you might find that the `double` version is *faster* — Caleth, Jan 16 '18 at 12:11
Personnally, I'd at least be a little wary of pow(float, float) since x^k is easier to compute when k is an integer. That said, benchmarking is probably the only way to decide which is faster. — Caninonos, Jan 16 '18 at 12:17
@Caninonos Actually, according to [cpp reference](http://en.cppreference.com/w/cpp/numeric/math/pow), all of these integer functions have been removed anyway... (comment to overload 7: *"If any argument has integral type, it is cast to double [...]"*) — Aconcagua, Jan 16 '18 at 12:31
[This answer](https://stackoverflow.com/questions/5627030/why-was-stdpowdouble-int-removed-from-c11) might be interesting... Having this in mind, I tend to recommend the float+float overload, but a definitive answer is not possible without bench-marking both variants (and the results are most likely compiler specific...). — Aconcagua, Jan 16 '18 at 12:46
One benefit of using float instead of double is it allow you to have more items in vectorization, but then only if the subsequent calculation using that data can be vectorized and benefit from it. — Non-maskable Interrupt, Jan 16 '18 at 12:52
I did some simple benchmark and it seems that casting `iexp` into float is the fastest way. I will test it in full program — Michal, Jan 16 '18 at 13:02
@Non-maskableInterrupt not sure you could keep this at the same precision without significantly losing accuracy, if you really want vectorization (and assuming profiling supports the work) you'll probably need to do it [yourself](http://software-lisc.fbk.eu/avx_mathfun/) using `exp(y*log(x))` and should examine the numerics to make sure you're not losing necessary information. — Mgetz, Jan 16 '18 at 14:13
@Non-maskable Interrupt Regarding to Caleth's comment, I once had an actual case where `double` was much faster than `float`. We expected it to be faster than using doubles because more data can be processed by the SSE instructions in one instruction, and our data were 24-bit floating point numbers. After inspecting the assembler code, the problem was that the results where always rounded to float precision with special instructions, and these consumed a lot of time, while the double operations did not use any rounding. There are compiler options to change this. — Jens, Jan 16 '18 at 14:19
What makes you think that a pow that used floats would be faster? Pow is often implemented using exp and log; both of which are cpu instructions on doubles which are very well optimised. When it comes to floating point, size does not matter. Not these days anyway. My money is on float being slower on modern common chipsets. — Bathsheba, Jan 16 '18 at 17:57
I wonder if you could just call an intrinsic for your particular CPU. On ARM, a single point precision would be faster, so there is some merit here. — Michael Dorgan, Jan 16 '18 at 18:02
@Bathsheba On architectures like GPU size does matter, for memory bandwidth reasons (among others), and NVIDIA is pushing half-precision floating point data types. — Mikhail, Jan 16 '18 at 19:28
@Non-maskableInterrupt from the OP's edit he's doing the operation for millions of elements, so obviously SIMD is a good choice here — phuclv, Jan 17 '18 at 01:43
@Mgetz the OP wants `float pow(float base, int iexp)` so probably you don't need the complex `exp(y*log(x))` and can just use exponentiation by squaring — phuclv, Jan 17 '18 at 01:50
@LưuVĩnhPhúc if OP is doing hundred GB pipe of `pow()`, best solution seems to move it to GPU compute shader. — Non-maskable Interrupt, Jan 17 '18 at 03:04

ivaigult · Accepted Answer · 2018-01-16T19:47:35.863

2

You could easily write your own fpow using exponentiation by squaring.

float my_fpow(float base, unsigned exp)
{
    float result = 1.f;
    while (exp)
    {
        if (exp & 1)
            result *= base;
        exp >>= 1;
        base *= base;
    }

    return result;
}

Boring part:

This algorithm gives the best accuracy, that can be archived with float type when |base| > 1

Proof:

Let we want to calculate pow(a, n) where a is base and n is exponent.
Let's define b₁=a¹, b₂=a², b₃=a⁴, b₄=a⁸,and so on.

Then aⁿ is a product over all such b_i where i_th bit is set in n.

So we have ordered set B={b_k1,b_k1,...,b_kn} and for any j the bit k_j is set in n.

The following obvious algorithm A can be used for rounding error minimization:

If B contains single element, then it is result
Pick two elements p and q from B with minimal modulo
Remove them from B
Calculate product s = p*q and put it to B
Go to the first step

Now, lets prove that elements in B could be just multiplied from left to right without loosing accuracy. It comes form the fact, that:

b_j > b₁*b₂*...*b_j-1

because b_j=b_j-1*b_j-1=b_j-1*b_j-2*b_j-2=...=b_j-1*b_j-2*...*b₁*b₁

Since, b₁ = a¹ = a and its modulo more than one then:

b_j > b₁*b₂*...*b_j-1

Hence we may conclude, that during multiplication from left to right the accumulator variable is less than any element from B.

Then, expression result *= base; (except the very first iteration, for sure) does multiplication of two minimal numbers from B, so the rounding error is minimal. So, the code employs algorithm A.

edited Jan 16 '18 at 19:47

answered Jan 16 '18 at 13:53

ivaigult

6,198
5
38
66

3

This will be less accurate than any overload of `pow`, and you don't provide any numbers that would indicate it's faster than the standard `pow`. – Jan 16 '18 at 13:59
@hvd But, why? libstdc++ does the same trick, except `double result`. – ivaigult Jan 16 '18 at 14:22
@hvd I've just tried on bunch of random numbers, and my solution gives _exactly_ same accuracy as std::pow. – ivaigult Jan 16 '18 at 14:32
2

Because this introduces additional rounding: it rounds after each individual multiplication. Are you saying the additional rounding will definitely not affect the result? If so, if you can include why in your answer, I'll be happy to remove my downvote. But when I test with random numbers, I *do* see cases where the answer is different. – Jan 16 '18 at 14:49
1

@hvd You are assuming that the pow function should be as close as possible to `a*a*a*...` where **`a` is a real number**. Considering that [pow definition](https://en.wikipedia.org/wiki/Exponentiation) shall do `a*a*a*a...` where **`a` is a float**, the fact that `pow` applyed to a double does not give the same result is not disqualifying, on the contrary. Actualy the only source of error in its function is due to the fact that [float multiplication is not associative](https://stackoverflow.com/questions/10371857/is-floating-point-addition-and-multiplication-associative) – Oliv Jan 16 '18 at 17:48
@Oliv `pow` is not defined in the standard as repeated multiplication, so you cannot use the inherent multiple rounding of multiple floating point multiplications as an excuse for multiple rounding in `pow`. – Jan 16 '18 at 18:02
2

What you can do is use `double` for the internal calculation and convert back to `float`. Individually, `double`usually isn't any slower than `float` (except for stuff division). The `float` <-> `double` conversions are fast and only happen once in each direction. – Mysticial Jan 16 '18 at 18:30
@Mysticial Extra precision makes it less likely, but it's still possible even then to get a different result than with `std::pow`. A simple brute force gives `base = std::nextafterf(1, 2); exp = 2897;` as an example. There are plenty of other examples with smaller exponents too. – Jan 16 '18 at 19:16
@hvd With `double` [it runs](http://coliru.stacked-crooked.com/a/d0ab1c395bff36ee) silently. – ivaigult Jan 16 '18 at 20:17
I believe the perfect accuracy for `|base| < 1` can be archived by multiplication in reverse order. I will add some changes tomorrow. – ivaigult Jan 16 '18 at 20:20
@hvd If the point is to be the same as `std::pow`, then it'll be pretty hard to find an exact replacement for it since you need to match whatever (single-ulp) errors it gives. The standard allows implementations to vary slightly like this. But if the point is to retain accuracy, then `double` should be sufficient and may actually be better than `std::pow`. – Mysticial Jan 16 '18 at 20:28
@ivaigult `std::nextafterf`, not `std::nextafter`, but I see you also changed the parameter from `float` to `double`, not just the `result` variable. I hadn't done that, and should have: I see now that not doing so had left part of the calculations as `float` when Mysticial had meant to change *all* the calculations to `double`. I'm pleasantly surprised that a brute force doesn't readily find cases where the results are different. Now to analyse if this is really true for all cases. Downvote removed meanwhile, and I encourage you to edit your answer to include the `double` version. – Jan 16 '18 at 21:39
@Mysticial Of course for cases where `std::pow` does not return the mathematically correct result rounded to the precision of the return type, and a custom implementation does, sure, agreed that there's no point in generating the exact same value as `std::pow`. This answer had given poorer results than `std::pow`. With your suggestion that may no longer be the case, and if so, I have no problems with that part. The question still remains though: is it faster? That was the primary motivation of the OP. – Jan 16 '18 at 21:56
I thing there is something wrong here: `b(j-1) * b(j-1) = b(j-1) * (b(j-2) * b(j-2)) != (b(j-1) * b(j-2)) * b(j-2))`. Moreover you speak about accuracy, but accuracy is relative to a target value. What is intended to compute your `pow` function, is it `a*a*a*....` where `a` is a real, or where `a` is a float or any other definition? – Oliv Jan 17 '18 at 08:08
@Oliv you're are talking `float` multiplication, while I'm trying to show you that the accumulator variable (`result` in the code) is _algebraically less_ than the reset of numbers we need to multiply. – ivaigult Jan 17 '18 at 08:24
@Oliv sure, we could compare what we got with the target value or we could greedily minimize rounding error by selecting what `*` to perform on every iteration (what I did). – ivaigult Jan 17 '18 at 08:28
@Oliv Ok, I dug into [glibc sources](https://github.com/lattera/glibc/blob/a2f34833b1042d5d8eeb263b4cf4caaea138c4ad/sysdeps/ieee754/dbl-64/e_pow.c#L65) and found out that a better accuracy could be reached for some bases using `x^y =e^(y log (X))` formula. So, this algorithm could be optimal only in the sense of putting brackets to `a*a*a*...*a` expression. – ivaigult Jan 17 '18 at 12:03

score 2 · Answer 2 · answered Jan 16 '18 at 16:58

Another question that can only be honestly answered with "wrong question". Or at least: "Are you really willing to go there?". float theoretically needs ca. 80% less die space (for the same number of cycles) and so can be much cheaper for bulk processing. GPUs love float for this reason.

However, let's look at x86 (admittedly, you didn't say what architecture you're on, so I picked the most common). The price in die space has already been paid. You literally gain nothing by using float for calculations. Actually, you may even lose throughput because additional extensions from float to double are required, and additional rounding to intermediate float precision. In other words, you pay extra to have a less accurate result. This is typically something to avoid except maybe when you need maximum compatibility with some other program.

See Jens' comment as well. These options give the compiler permission to disregard some language rules to achieve higher performance. Needless to say this can sometimes backfire.

There are two scenarios where float might be more efficient, on x86:

GPU (including GPGPU), in fact many GPUs don't even support double and if they do, it's usually much slower. Yet, you will only notice when doing very many calculations of this sort.
CPU SIMD aka vectorization

You'd know if you did GPGPU. Explicit vectorization by using compiler intrinsics is also a choice – one you could make, for sure, but this requires quite a cost-benefit analysis. Possibly your compiler is able to auto-vectorize some loops, but this is usually limited to "obvious" applications, such as where you multiply each number in a vector<float> by another float, and this case is not so obvious IMO. Even if you pow each number in such a vector by the same int, the compiler may not be smart enough to vectorize this effectively, especially if pow resides in another translation unit, and without effective link time code generation.

If you are not ready to consider changing the whole structure of your program to allow effective use of SIMD (including GPGPU), and you're not on an architecture where float is indeed much cheaper by default, I suggest you stick with double by all means, and consider float at best a storage format that may be useful to conserve RAM, or to improve cache locality (when you have a lot of them). Even then, measuring is an excellent idea.

That said, you could try ivaigult's algorithm (only with double for the intermediate and for the result), which is related to a classical algorithm called Egyptian multiplication (and a variety of other names), only that the operands are multiplied and not added. I don't know how pow(double, double) works exactly, but it is conceivable that this algorithm could be faster in some cases. Again, you should be OCD about benchmarking.

My main reason to go from `double` to `float` is RAM -- I really need tens or hundreds of GB and so saving 1/2 is huge advantage. And because I am calling `pow` a lot I need to be sure that `float -> pow -> float` is as efficient as it can be. — Michal, Jan 16 '18 at 17:31
@Michal as you have GBs of data, you have to use SIMD. With AVX you can calculate powers for 8 floats at once (4 with SSE, 16 for AVX-512). [Some libraries for this](https://stackoverflow.com/q/25936031/995714). An even better choice would be GPGPU as they have a lot more execution units than a CPU — phuclv, Jan 17 '18 at 01:47

score 2 · Answer 3 · answered Jan 17 '18 at 12:33

2

If you're targeting GCC you can try

float __builtin_powif(float, int)

I have no idea about it's performance tough.

answered Jan 17 '18 at 12:33

Boris L.

91
3

good that you find the builtin, but it's not a good way to process GBs of data – phuclv Jan 17 '18 at 16:50

AMA · Answer 4 · 2018-01-16T17:59:40.100

Is there some special function (in standard libraries or any other) with the above signature?

Unfortunately, not that I know of.

But, as many have already mentioned benchmarking is necessary to understand if there is even an issue at all.

I've assembled a quick benchmark online. Benchmark code:

#include <iostream>
#include <boost/timer/timer.hpp>
#include <boost/random/mersenne_twister.hpp>
#include <boost/random/uniform_real_distribution.hpp>
#include <cmath>

int main ()
{
    boost::random::mt19937 gen;
    boost::random::uniform_real_distribution<> dist(0, 10000000);

    const size_t size = 10000000;
    std::vector<float> bases(size);
    std::vector<float> fexp(size);
    std::vector<int> iexp(size);
    std::vector<float> res(size);

    for(size_t i=0; i<size; i++)
    {
        bases[i] = dist(gen);
        iexp[i] = std::floor(dist(gen));
        fexp[i] = iexp[i];
    }

    std::cout << "float pow(float, int):" << std::endl;
    {
        boost::timer::auto_cpu_timer timer;
        for(size_t i=0; i<size; i++)
            res[i] = std::pow(bases[i], iexp[i]);
    }

    std::cout << "float pow(float, float):" << std::endl;
    {
        boost::timer::auto_cpu_timer timer;
        for(size_t i=0; i<size; i++)
            res[i] = std::pow(bases[i], fexp[i]);
    }
    return 0;
}

Benchmark results (quick conclusions):

gcc: c++11 is consistently faster than c++03.
clang: indeed int-version of c++03 seems a little faster. I'm not sure if it is within a margin of error, since I only run the benchmark online.
Both: even with c++11 calling pow with int seems to be a tad more performant.

It would be great if others could verify if this holds for their configurations as well.

ivan.ukr · Answer 5 · 2018-01-16T12:21:58.813

0

Try using powf() instead. This is C99 function that should be also available in C++11.

edited Jan 16 '18 at 12:21

answered Jan 16 '18 at 12:15

ivan.ukr

2,853
1
23
41

6

powf has the same signature as std::pow(float, float) which he mentioned in the question and this answer doesn't really provide any evidence it might be faster than std::pow(float, int) – Caninonos Jan 16 '18 at 12:25

How to force pow(float, int) to return float

5 Answers5

Boring part:

Proof: