Intel introduced ADCX, AODX to support big integer arithmetic but it is hard to generate ADCX/AODX using only one function: _addcarryx_u64
Suppose that for big integer addition of A and B where the integer is represented by an array of uint64_t
denoted by pa
and pb
respectively. Here's a possible implementation (pa
and pb
are kept to be of same length n
and the overflow of the result is not the concern` for simplicity).
#include <immintrin.h>
using limb_type = uint64_t;
void add(const limb_type * __restrict__ pa, const limb_type * __restrict__ pb, limb_type * __restrict__ pr, unsigned n) {
unsigned char carry = 0;
unsigned i;
for(i=0;i<n;i++) {
carry = _addcarryx_u64(carry, pa[i], pb[i], &pr[i]);
}
}
mov rcx, QWORD PTR [rdi+rax] # tmp106, MEM[base: pa_16(D), index: ivtmp.11_26, offset: 0B]
add r8b, -1 # carry: to set CF Flag for subsequent adc
adc rcx, QWORD PTR [rsi+rax] # tmp106, MEM[base: pb_15(D), index: ivtmp.11_26, offset: 0B]
mov QWORD PTR [rdx+rax], rcx #* ivtmp.11, tmp106
setc r8b #, carry
add rax, 8 # ivtmp.11,
cmp r9, rax # _20, ivtmp.11
jne .L3 #,
The problem is that the loop increment resets CF flag which in the next iteration is set by instruction number 2. Thus there are redundant instructions that are being executed: 2nd instruction (ADD) and 5th instruction SETC.
Solution: The need to save and set the CF flag can be eliminated if somehow it is possible to save the carry to OF flag and use ADOX instead of ADC as generated by GCC/CLANG.
Since inline ASM is not recommended by GCC and I want to consider it as my last resort but is there a way to do this?
Note that this is a performance critical loop and that's why I want to eliminate 2 instructions.