Optimization of subsequent calls to integer division and modulo (remainder)

Question

Integer division / and modulo % operations are often used together in programming, sometimes even on the same operands and in subsequent lines. For example, the following C function, which is a simple function that sums the result of a / of 2 numbers with the result of their %, does just that:

int sum2digits(int x, int base) {
    int n, m;
    n = x / base;
    m = x % base;
    return n + m;
}

As I know, both / and % are performed by the same machine instruction (in x86). Say, if you perform a machine instruction for integer division (div or idiv) of two numbers, a and b, then afterwards the value of a / b will be stored in the register EAX and the remainder a % b in EDX.
I wondered whether the compiler takes advantage of this quality and took a look at the assembly code. It turns out that normal compilation with gcc doesn't optimize this:

push   %rbp
mov    %rsp,%rbp
mov    %edi,-0x14(%rbp)
mov    %esi,-0x18(%rbp)
mov    -0x14(%rbp),%eax
mov    %eax,%edx
sar    $0x1f,%edx
idivl  -0x18(%rbp)
mov    %eax,-0x8(%rbp)
mov    -0x14(%rbp),%eax
mov    %eax,%edx
sar    $0x1f,%edx
idivl  -0x18(%rbp)
mov    %edx,-0x4(%rbp)
mov    -0x4(%rbp),%eax
mov    -0x8(%rbp),%edx
add    %edx,%eax
pop    %rbp
retq

This assembly code does 2 subsequent calls to idivl, but each time reads the result from another register (EAX for quotient, EDX for remainder). However, compiling with the -O changes the picture:

mov    %edi,%eax
mov    %edi,%edx
sar    $0x1f,%edx
idiv   %esi
add    %edx,%eax
retq

This code calls idiv only once, and uses its value for both computations.
Why isn't this kind of optimization a default? What is the use of calling div twice in a row? Can this optimization change the behaviour of a program in any way?
Also, and perhaps even more important, is there a way, as a programmer, to manually extract these 2 values (quotient and remainder) guaranteeing that only 1 integer division is performed by the CPU?

By default, GCC disables ***all*** optimizations unless you specify a `-O`. Not just this one... — Mysticial, Apr 09 '13 at 21:14
I've deleted my answer, since I couldn't get it to work. On the other hand, if you say `return x / base + x % base;` and enable compiler optimizations, you should get the efficient implementation. — Kerrek SB, Apr 09 '13 at 21:30
With GCC you can use inline assembly to guarantee that you get both results from one division, but I can't be bothered to look up the details, and I haven't used inline assembly for quite a while... — Seg Fault, Apr 09 '13 at 21:30
You definitely want this function to inline so `base` will be a compile-time constant, letting the compiler use a multiplicative inverse instead of an expensive `idiv` at all. Also, are you sure you want to use signed integers? Does a negative result make sense? — Peter Cordes, Oct 19 '22 at 07:35

jthill · Accepted Answer · 2013-04-10T04:52:54.697

Why isn't this kind of optimization a default?

If compilers and optimizers were perfect and debuggers could reverse-engineer code, optimization would be a universal default. But compilers don't always generate correct code, optimizers don't always preserve semantics, and debuggers can't always figure out what part(s) of a optimized program any given instruction relates to. It looks like Your compiler was installed with default options preset for absolute safety and debugging simplicity.

is there a way to manually extract these 2 values (quotient and remainder) guaranteeing that only 1 integer division is performed by the CPU?

These days the best way is exactly what you did: ask the compiler for optimized code. The div routine is a holdover from days when results from the division operators were implementation defined for negative numbers and optimizing compiles were so painfully slow that identifying things like this was better done manually.

The specific reasons GCC `-O0` doesn't optimize to one `idiv` here is that `-O0` stops it from optimizing across C statements. If you were single-stepping with a debugger, you could change the value of `base` between the two statements. See [Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?](https://stackoverflow.com/q/53366394) for more about the "consistent debugging" that `-O0` gives, including being able to jump to any source line within the function. — Peter Cordes, Oct 19 '22 at 07:37

score 2 · Answer 2 · answered Apr 09 '13 at 21:39

You could always implement your own division:

#include <stdlib.h>
#include <stdio.h>

void mydiv(int dividend, int divisor, int* quotient, int* remainder)
{
  *quotient = dividend / divisor;
  *remainder = dividend - *quotient * divisor;
}

int testData[][2] =
{
  { +5, +3 },
  { +5, -3 },
  { -5, +3 },
  { -5, -3 },
};

int main(void)
{
  unsigned i;
  for (i = 0; i < sizeof(testData)/sizeof(testData[0]); i++)
  {
    div_t res1, res2;
    res1 = div(testData[i][0], testData[i][1]);
    mydiv(testData[i][0], testData[i][1], &res2.quot, &res2.rem);
    printf("%+d/%+d = %+d:%+d %c= %+d:%+d\n",
           testData[i][0], testData[i][1],
           res1.quot, res1.rem,
           "!="[res1.quot == res2.quot && res1.rem == res2.rem],
           res2.quot, res2.rem);
  }
  return 0;
}

Output (ideone):

+5/+3 = +1:+2 == +1:+2
+5/-3 = -1:+2 == -1:+2
-5/+3 = -1:-2 == -1:-2
-5/-3 = +1:-2 == +1:-2

This does have one division. However, it looks like gcc isn't smart enough to eliminate the multiplication and so you have one of each.

"However, it looks like gcc isn't smart enough to eliminate the multiplication" - and this, folks, is why naive "inefficient" source code always wins over hand-optimized "efficient" source code (over long enough time scales). The simpler, clearer code directly expresses its intent/behavior and lowers the bar for optimization heuristics, while the optimized code obfuscates the intent in the side-effects of whatever is currently available in the language and is most efficient in modern hardware at the time and presents a higher bar for optimization heuristics. — mtraceur, Oct 26 '21 at 23:14

score 2 · Answer 3 · edited Oct 18 '22 at 22:14

2

Why not just use div?

http://www.cplusplus.com/reference/cstdlib/div/

I'd say that's the best chance for an optimized, portable solution?

The OP wants to know about Optimization of subsequent calls to integer division and modulo (remainder) with C.

So, if you want to be optimal in your code, why not use a standard library call that defers responsibility of the optimization to the standard library implementers who may have better information of the compiler's inner workings and the available operations of the native machine (i.e. use div assembly instruction on x86). Especially when the function does EXACTLY what the OP is trying to do.

If I saw a division followed by a mod in real-life code, my immediate question would be "why didn't you use the standard library?", not "hmm, I wonder how the compiler might optimize my high-level code?".

It also answers the part of the question: Also, and perhaps even more important, is there a way, as a programmer, to manually extract these 2 values (quotient and remainder) guaranteeing that only 1 integer division is performed by the CPU?

edited Oct 18 '22 at 22:14

miken32

42,008
16
111
154

answered Apr 10 '13 at 07:55

Josh Petitt

9,371
12
56
104

Josh, I've upvoted your answer - though you're probably not still annoyed 18 months later... I've arrived here looking for talk about `div(.,.)`. I'm cool with the arguments that modern optimized compilers make it rather/potentially unnecessary but it's there and it's not harmful. You answered the original question. I like answers that say 'here's your answer but how about these other thoughts that might be useful'. I get annoyed with 'You don't want to be doing that' or 'Why would you do that?' answers unless what you're doing is positively bad. – Persixty Nov 26 '14 at 12:06
I think an important point in favor of this answer is that it is much easier to write a compiler optimization that replaces calls to a standard function which implements a specific behavior with efficient inline machine code than it is to write a compiler optimization that detects that two or more operations next to each other together implement that same behavior. – mtraceur Oct 26 '21 at 23:32

Optimization of subsequent calls to integer division and modulo (remainder)

3 Answers3