Are C/C++ library functions and operators the most optimal ones?

Question

So, at the divide & conquer course we were taught:

Karatsuba multiplication
Fast exponentiation

Now, given 2 positive integers a and b is operator::* faster than a karatsuba(a,b) or is pow(a,b) faster than

int fast_expo(int Base, int exp)
{
    if (exp == 0) {
        return 1;
    }
    if (exp == 1) {
        return Base
    }
    if (exp % 2 == 0) {
        return fast_expo(Base, exp / 2) * fast_expo(Base, exp / 2);
    }
    else {
        return base * fast_expo(Base, exp / 2) * fast_expo(Base, exp / 2);
    }
}

I ask this because I wonder if they have just a teaching purpose or they are already base implemented in the C/C++ language

`int`s don't go high enough for those things to be useful. The crazy optimizations that go into `int` multiplication are all electrical engineering by people at companies like Intel. — user2357112, Jun 24 '16 at 22:45
So they are useful only when I need to use very large numbers? — Myrky, Jun 24 '16 at 22:51
This is a good example for teaching about algorithms and boundary conditions. The actual output of compilers is highly optimized for the architecture of the CPU that the code will run on. — stark, Jun 24 '16 at 22:54
There is no language C/C++. If you mean the two different languages C and C++, pick **one** (I changed to C++, as you use C++ notation for the operator) and clarify your question. And format this mess properly! — too honest for this site, Jun 24 '16 at 23:13
"Most optimal" doesn't really make sense without knowing the context, i.e. the problem being solved. — Tamás Szelei, Jun 24 '16 at 23:14

score 1 · Accepted Answer · edited May 23 '17 at 11:54

Karatsuba multiplication is a special technique for large integers. It is not comparable to the built in C++ * operator which multiplies together operands of basic type like int and double.

To take advantage of Karatsuba, you have to be using multi-precision integers made up of at least around 8 words. (512 bits, if these are 64 bit words). The break-even point at which Karatsuba becomes advantageous is at somewhere between 8 and 24 machine words, according to the accepted answer to this question.

The pow function which works with a pair of floating-point operands of type double, is not comparable to your fast_expo, which works with operands of type int. They are different functions with different requirements. With pow, you can calculate the cube root of 5: pow(5, 1/3.0). If that's what you would like to calculate, then fast_expo is of no use, no matter how fast.

There is no guarantee that your compiler or C library's pow is absolutely the fastest way for your machine to exponentiate two double-precision floating-point numbers.

Optimization claims in floating-point can be tricky, because it often happens that multiple implementations of the "same" function do not give exactly the same results down to the last bit. You can probably write a fast my_pow that is only good to five decimal digits of precision, and in your application, that approximation might be more than adequate. Have you beat the library? Hardly; your fast function doesn't meet the requirements that would qualify it as a replacement for the pow in the library.

score 1 · Answer 2 · edited Apr 13 '17 at 12:48

operator::* and other standard operators usually map to the primitives provided by the hardware. In case, such primitives don't exist (e.g. 64-bit long long on IA32), the compiler emulates them at a performance penalty (gcc does that in libgcc).

Same for std::pow. It is part of the standard library and isn't mandated to be implemented in a certain way. GNU libc implements pow(a,b) as exp(log(a) * b). exp and log are quite long and written for optimal performance with IEEE754 floating point in mind.

As for your sugestions:

Karatsuba multiplication for smaller numbers isn't worth it. The multiply machine instruction provided by the processor is already optimized for speed and power usage for the standard data types in use. With bigger numbers, 10-20 times the register capacity, it starts to pay off:

In the GNU MP Bignum Library, there used to be a default KARATSUBA_THRESHOLD as high as 32 for non-modular multiplication (that is, Karatsuba was used when n>=32w with typically w=32); the optimal threshold for modular exponentiation tending to be significantly higher. On modern CPUs, Karatsuba in software tends to be non-beneficial for things like ECDSA over P-256 (n=256, w=32 or w=64), but conceivably useful for much wider modulus as used in RSA.

Here is a list with the multiplication algorithms, GNU MP uses and their respective thresholds.

Fast exponentiation doesn't apply to non-integer powers, so it's not really comparable to pow.

score 0 · Answer 3 · answered Jun 24 '16 at 22:57

0

A good way to check the speed of an operation is to measure it. If you run through the calculation a billion or so times and see how much time it takes to execute you have your answer there.

One thing to note. I'm lead to believe that % is fairly expensive. There is a much faster way to check if something is divisible by 2:

check_div_two(int number)
{
    return ((number>>1) & 0x01);
}

This way you've just done a bit shift and compared against a mask. I'd assume it's a less expensive op.

answered Jun 24 '16 at 22:57

Michael Smith

474
4
11

2

Shouldn't it just be (number & 1) == 0? You're discarding the lsb, so odd numbers remain untested – Shaggi Jun 24 '16 at 23:02
`%` might be generally more expensive when you're dealing with unknown values, but every compiler is smart enough to recognize `x % 2 == 0` and will optimize it to the exact same code as `(x & 1) == 1`. – Benjamin Lindley Jun 24 '16 at 23:08
1

`(number>>1) & 0x01` tests wether the second bit is `1` where "divisable by two" (even vs. odd) must be a test for `0` in the first bit. – Pixelchemist Jun 24 '16 at 23:10

score 0 · Answer 4 · answered Jun 24 '16 at 23:17

The * operator for built-in types will almost certainly be implemented as a single CPU multiplication instruction. So ultimately this is a hardware question, not a language question. Longer code sequences, perhaps function calls, might be generated in cases where there's no direct hardware support.

It's safe to assume that chip manufacturers (Intel, AMD, et al) expend a great deal of effort making arithmetic operations as efficient as possible.

Are C/C++ library functions and operators the most optimal ones?

4 Answers4