The kind of division that produces a non-negative modulo is called Euclidean division.
For C implementations and more background, see Division and Modulus for Computer Scientists. (also related: What's the difference between “mod” and “remainder”?).
This is a special case, where we know the divisor is positive, which allows the faster implementation suggested by Ped7g. It's probably optimal or at least close to it.
Interestingly, we can write it in C in a way that compiles to the same asm as Ped7g's hand-written version (or close to it, depending on gcc vs. clang). See it on the Godbolt compiler explorer.
// https://stackoverflow.com/questions/40861023/how-do-i-get-a-positive-modulo-on-a-negative-dividend
// uses unsigned carry to detect when a negative 2's complement integer wrapped to non-negative
long modE_positive_divisor_2s_complement( long D, long d )
{
long r = D%d;
unsigned long tmp = (unsigned long)r + (unsigned long)d;
// detect carry by looking for unsigned wraparound:
// that means remainder was negative (not zero), so adding the (non-negative) divisor is what we need
r = (tmp < (unsigned long)r) ? (long)tmp : r;
return r;
}
With clang3.9 -O3, we get:
mov rax, rdi
cqo
idiv rsi
add rsi, rdx
cmovae rsi, rdx
mov rax, rsi
ret
gcc6.2 uses an extra CMP before the CMOV :(
On Godbolt, I included the original general-case C function and a version that tells the compiler to assume positive d
(using __builtin_unreachable()
). For the latter, gcc does about as well as with this, using a test rdx,rdx
/ cmovs
to do the conditional add.
It works identically for 32-bit vs. 64-bit long
(with EAX instead of RAX, etc.), so use int32_t
if you care about performance and don't need 64-bit integers. (idiv r64
is much slower than idiv r32
).