Many programming languages, including the older standards of C and C++, guarantee the division rule, which is
a = b * (a / b) + a % b
even if the exact values of a/b
and a%b
are left undefined.[0] This can be exploited to calculate the desired result in many languages and platforms using (an equivalent of) the following code:
int divF(int a, int b) { return a / b - (a % b < 0); }
This is the version from @TomasPetricek's answer. However, it works only for b > 0
!
The following code works for any b != 0
:[1]
int sign(int x) { return (x > 0) - (x < 0); }
int divF2(int a, int b) { return a / b - (sign(a % b) == -sign(b)); }
However, the rounding-down division (aka flooring division, aka Knuth's division) is not always desirable. It is argued[2] that Euclidean division is the most generally useful one. It rounds down for b > 0
, and up for b < 0
. It has the nice property that the value of the compatibly defined remainder is always non-negative, for all a
and b
, independently of their signs. Additionally it coincides with bit-shifts and masking on twos complement machines for power-of-two divisors. Yes, it is also faster to calculate:
int divE(int a, int b) {
int c = a % b < 0;
return a / b + (b < 0 ? c : -c);
}
All three versions generate branchless code with clang -O3
on amd64. However, on twos complement architectures, the following may be marginally faster:
int divE2(int a, int b) { return a / b + (-(a % b < 0) & (b < 0 ? 1 : -1)); }
Generated code
divF:
cdq
idiv esi
sar edx, 31
add eax, edx
divF2:
cdq
idiv esi
xor ecx, ecx
test edx, edx
setg cl
sar edx, 31
add edx, ecx
xor ecx, ecx
test esi, esi
setg cl
shr esi, 31
sub esi, ecx
xor ecx, ecx
cmp edx, esi
sete cl
sub eax, ecx
Chazz:
cdq
idiv esi
test edx, edx
cmove edi, esi
xor edi, esi
sar edi, 31
add eax, edi
divE:
cdq
idiv esi
mov ecx, edx
shr ecx, 31
sar edx, 31
test esi, esi
cmovs edx, ecx
add eax, edx
divE2:
cdq
idiv esi
sar edx, 31
shr esi, 31
lea ecx, [rsi + rsi]
add ecx, -1
and ecx, edx
add eax, ecx
Benchmarks
simple truncating division:
2464805950: 1.90 ns -- base
euclidean division:
2464831322: 2.13 ns -- divE
2464831322: 2.13 ns -- divE2
round to -inf for all b:
1965111352: 2.58 ns -- Chazz
1965111352: 2.64 ns -- divF2
1965111352: 5.02 ns -- Warty
round to -inf for b > 0, broken for b < 0:
1965143330: 2.13 ns -- ben135
1965143330: 2.13 ns -- divF
1965143330: 2.13 ns -- Tomas
4112595000: 5.79 ns -- runevision
round to -inf, broken for b < 0 or some edge-cases:
4112315315: 2.24 ns -- DrAltan
1965115133: 2.45 ns -- Cam
1965111351: 7.76 ns -- LegsDrivenCat
Compiled with clang -O3
on FreeBSD 12.2, i7-8700K CPU @ 3.70GHz. The first column is a checksum grouping algorithms producing the same result. base
is the simplest truncating division used to measure the testing overhead.
The test-bed:
static const int N = 1000000000;
int rng[N][2], nrng;
void push(int a, int b) { rng[nrng][0] = a, rng[nrng][1] = b, ++nrng; }
SN_NOINLINE void test_case(int (*func)(), const char *name) {
struct timespec t0, t1;
clock_gettime(CLOCK_PROF, &t0);
int sum = func();
clock_gettime(CLOCK_PROF, &t1);
double elapsed = (t1.tv_sec - t0.tv_sec)*1.e9 + (t1.tv_nsec - t0.tv_nsec);
printf("%10u: %5.2f ns -- %s\n", sum, elapsed/N, name);
}
#define ALL_TESTS(X) X(base) X(divF) X(divF2) X(divE) X(divE2) X(ben135) X(DrAltan) X(Cam) X(Tomas) X(Warty) X(Chazz) X(runevision) X(LegsDrivenCat)
#define LOOP_TEST(X) \
SN_NOINLINE int loop_##X() { \
int sum = 0; \
for(int i = 0; i < N; ++i) sum += X(rng[i][0], rng[i][1]); \
return sum; \
} \
/**/
ALL_TESTS(LOOP_TEST)
int main() {
srandom(6283185);
push(INT_MAX, 1); push(INT_MAX, -1); push(INT_MIN, 1);
push(INT_MAX, 2); push(INT_MAX, -2); push(INT_MIN, 2); push(INT_MIN, -2);
while(nrng < N) {
int a = random() - 0x40000000, b;
do b = (random() >> 16) - 0x4000; while(b == 0);
push(a,b);
}
#define CALL_TEST(X) test_case(loop_##X, #X);
ALL_TESTS(CALL_TEST)
}
Footnotes/references
- Nowadays they are defined in C and C++ to round by truncation. So negatives are rounded upwards, positives downwards. This is what hardware divisors do. This is a bummer, cause it's the least useful rounding rule.
- Daan Leijen. Division and Modulus for Computer Scientists. December 2001.
- Raymond T. Boute. The Euclidean definition of the functions div and mod. April 1992.