9

When working with arbitrary precision arithmetic (e.g. 512-bit integers), is there any way to get GCC to use ADC and similar instructions without using inline assembly?

A first glance at GMP's sourcecode shows that they simply have assembly implementations for every supported platform.

Here is the test code I wrote, which adds two 128-bit numbers from the command line and prints the result. (Inspired by mini-gmp's add_n):

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>

int main (int argc, char **argv)
{
    uint32_t a[4];
    uint32_t b[4];
    uint32_t c[4];
    uint32_t carry = 0;

    for (int i = 0; i < 4; ++i)
    {
        a[i] = strtoul (argv[i+1], NULL, 16);
        b[i] = strtoul (argv[i+5], NULL, 16);
    }

    for (int i = 0; i < 4; ++i)
    {
        uint32_t aa = a[i];
        uint32_t bb = b[i];
        uint32_t r = aa + carry;
        carry = (r < carry);
        r += bb;
        carry += (r < bb);
        c[i] = r;
    }

    printf ("%08X%08X%08X%08X + %08X%08X%08X%08X =\n", a[3], a[2], a[1], a[0], b[3], b[2], b[1], b[0]);
    printf ("%08X%08X%08X%08X\n", c[3], c[2], c[1], c[0]);

    return 0;
}

GCC -O3 -std=c99 Does not produce any adc instructions, as checked by objdump. My gcc version is i686-pc-mingw32-gcc (GCC) 4.5.2.

Mysticial
  • 464,885
  • 45
  • 335
  • 332
morrog
  • 664
  • 1
  • 8
  • 17
  • 3
    Nope. Not gonna happen that easily (if it's at all possible). That's one of the reasons *why* GMP uses inline assembly. – Mysticial Mar 29 '13 at 02:43
  • @Mysticial: I agree, but I wanted to pose the question just in case. I saw little meaningful discussion on the topic during my Google searches. – morrog Mar 29 '13 at 02:47
  • 3
    Yeah, I know exactly what you're talking about since I've tried to do the exact same before and failed miserably. Not only is GCC not able to do it, but neither will MSVC nor ICC. In short, it requires a very specialized compiler optimization pass to detect the intent - which is so niche that no compiler writer would waste time on such a thing. – Mysticial Mar 29 '13 at 02:49
  • @Mysticial, I got ICC to do this efficiently using the `_addcarry_u64` intrinsic https://stackoverflow.com/questions/29029572/multi-word-addition-using-the-carry-flag/29212615#29212615 – Z boson Mar 23 '15 at 14:23
  • @Zboson ICC and MSVC didn't have that intrinsic until very recently. (~1 year ago) – Mysticial Mar 23 '15 at 14:48
  • @Mysticial, woah, I did not realized MSVC had `_addcarry_u64`. I just tried it and indeed it produces 1x `add`, and 3x `adc` ! What's up with GCC and Clang then? – Z boson Mar 23 '15 at 16:40

1 Answers1

1

GCC will use the carry flag if it can see that it needs to:
When adding two uint64_t values on a 32-bit machine, for example, this must result in one 32-bit ADD plus one 32-bit ADC. But apart from those cases, where the compiler is forced to use the carry, it probably cannot be persuaded to do so w/o assembler. Therefore, it may be beneficial to use the biggest integer type available to allow GCC to optimize operations by effectively letting it know that the single 'components' of the value belong together.

For the simple addition, another way to calculate the carry could be to look at the relevant bits in the operands, like:

uint32_t aa,bb,rr;
bool msbA, msbB, msbR, carry;
// ...

rr = aa+bb;

msbA = aa >= (1<<31); // equivalent: (aa & (1<<31)) != 0;
msbB = bb >= (1<<31);
msbR = rr >= (1<<31);


carry = (msbA && msbB) || ( !msbR && ( msbA || msbB) );
JimmyB
  • 12,101
  • 2
  • 28
  • 44
  • 1
    Using a bigger integer type only pushes the problem to a higher level. Unless GCC has 512-bit integer types, you still need to mess with the carry logic. – Mysticial Mar 29 '13 at 14:27
  • 1
    Of course, you will definitely have to compute your own carry sooner or later. But when using `uint64_t` instead of `uint32_t`, for example, you will save yourself 50% of the 'manual' carry flag calculations and their processing time. – JimmyB Mar 29 '13 at 14:31
  • If you are going for speed (as most people are in multiprecision packages), you might want to compute the carry using aa, bb, and rr and *then* right shiftng 31 bits, saving 2 shifts. You also probably want to non-branching operators & | ~ because these bit are presumably very hard to predict and so a branch predictor will likely guess wrong while executing the expression. – Ira Baxter Mar 19 '15 at 14:15
  • 3
    Another cheap trick is to only use 31 bits of the 32 bit word (even better, 63 of 64 bits). You give up 3% or less of your storage capacity, but the carry bit gets computed for you in the MSB of the sum. – Ira Baxter Mar 19 '15 at 14:16
  • 1
    By the way, for unsigned types `carry == ((a+b) < a)`. – JimmyB Mar 19 '15 at 15:27
  • If you need multiprecision arithmethic occasionally, then doing the carry computation klunkily in C won't have any real impact on performance of the application oveall. Getting the carry managed by the compiler is only going to matter if your computation uses the multiprecision arithmetic very heavily. Then you are likely to find out that divide-multiprecision is the real cost. – Ira Baxter Mar 20 '15 at 10:59
  • @HannoBinder: ((a+b) – Ira Baxter Mar 20 '15 at 11:02
  • 2
    Maybe it is worth having a look at GCC's Integer Overflow Builtins: https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html – Till Kolditz Sep 07 '16 at 15:34