Division and modulus using single divl instruction (i386, amd64)

Question

I was trying to come up with inline assembly for gcc to get both division and modulus using single divl instruction. Unfortunately, I am not that good at assembly. Could someone please help me on this? Thank you.

See http://stackoverflow.com/questions/3323445/what-is-the-difference-between-asm-and-asm/35959859#35959859 where I used this as an example of MSVC inline asm vs. GNU C inline asm. (Including a working `divl` wrapper function that can inline, with only one instruction inside the inline asm statement, the same as D0SBoots's correct answer here.) — Peter Cordes, Nov 28 '16 at 04:45
See also https://stackoverflow.com/questions/32741032/how-to-access-c-struct-variables-from-inline-asm/32747262#32747262 for a near duplicate, showing that you don't need to use inline asm (the compiler does it for you), but also showing how to do it correctly with inline asm. (No significant difference to DOSBoot's answer) — Peter Cordes, Mar 05 '18 at 05:08

score 9 · Answer 1 · answered May 28 '12 at 08:03

You're looking for something like this:

__asm__("divl %2\n"
       : "=d" (remainder), "=a" (quotient)
       : "g" (modulus), "d" (high), "a" (low));

Although I agree with the other commenters that usually GCC will do this for you and you should avoid inline assembly when possible, sometimes you need this construct.

For instance, if the high word is less than the modulus, then it is safe to perform the division like this. However, GCC isn't smart enough to realize this, because in the general case dividing a 64 bit number by a 32 bit number can lead to overflow, and so it calls to a library routine to do extra work. (Replace with 128 bit/64 bit for 64 bit ISAs.)

`"g"` for the source operand is not correct, because DIV doesn't support immediate operands. Use `"rm"`. (Actually, you might want `"g"` so your code will break at compile-time if you're ever shooting yourself in the foot by forcing the use of DIV when the divisor is a compile-time constant.) You could maybe also use `__builtin_constant_p` to detect compile-time-constant operands. It should work well with gcc, but clang evaluates it before function inlining (so you get false negatives). — Peter Cordes, Nov 28 '16 at 04:36

score 8 · Answer 2 · answered Apr 09 '11 at 23:30

8

You shouldn't try to optimize this yourself. GCC already does this.

volatile int some_a = 18, some_b = 7;

int main(int argc, char *argv[]) {
    int a = some_a, b = some_b;
    printf("%d %d\n", a / b, a % b);
    return 0;
}

Running

gcc -S test.c -O

yields

main:
.LFB11:
    .cfi_startproc
    subq    $8, %rsp
    .cfi_def_cfa_offset 16
    movl    some_a(%rip), %esi
    movl    some_b(%rip), %ecx
    movl    %esi, %eax
    movl    %esi, %edx
    sarl    $31, %edx
    idivl   %ecx
    movl    %eax, %esi
    movl    $.LC0, %edi
    movl    $0, %eax
    call    printf
    movl    $0, %eax
    addq    $8, %rsp
    .cfi_def_cfa_offset 8
    ret

Notice that the remainder, %edx, is not moved because it is also the third argument passed to printf.

EDIT: The 32-bit version is less confusing. Passing -m32 yields

main:
    pushl   %ebp
    movl    %esp, %ebp
    andl    $-16, %esp
    subl    $16, %esp
    movl    some_a, %eax
    movl    some_b, %ecx
    movl    %eax, %edx
    sarl    $31, %edx
    idivl   %ecx
    movl    %edx, 8(%esp)
    movl    %eax, 4(%esp)
    movl    $.LC0, (%esp)
    call    printf
    movl    $0, %eax
    leave
    ret

answered Apr 09 '11 at 23:30

raylu

2,630
3
17
23

In fact, gcc doesn't. At least not with flags I am using. Maybe I am missing something. – Apr 09 '11 at 23:32
I'm on gcc (Debian 4.5.2-4) 4.5.2, but even 4.3 does this. Are you passing -O? – raylu Apr 09 '11 at 23:34
1

I am using gcc 4.6.. but damn! I spent like 30 minutes playing with this before asking here and it turns out I just forgot to specify '-O3'. I had '-mtune=native' though, but it didn't help. Thanks! I should get some sleep... – Apr 09 '11 at 23:37
Why only `-O`? Use at least `-O2` so gcc will peephole the sign-extension into `edx` with `cdq` (aka `cltd` in AT&T syntax, IIRC). Anyway, if you write a function that takes two args, and stores the results to globals (or return them as a struct, or return one and store the other), it will avoid optimizing away but need fewer instructions that if you write a `main()` and use `volatile`. See https://stackoverflow.com/questions/32741032/how-to-access-c-struct-variables-from-inline-asm/32747262#32747262 – Peter Cordes Mar 05 '18 at 05:04
to show the minimum needed to get the behavior described in the question. – raylu Mar 06 '18 at 10:11
Ok, but this shows gcc making clunky code, and isn't how you should actually compile. (And BTW, the OP was asking for `divl`; you could have used `unsigned` so it only needs to zero `edx` instead of sign-extending into edx:eax.) – Peter Cordes Mar 06 '18 at 10:19

score 5 · Answer 3 · answered Apr 09 '11 at 23:32

Fortunately, you don't have to resort to inline assembly to achieve this. gcc will do this automatically when it can.

$ cat divmod.c

struct sdiv { unsigned long quot; unsigned long rem; };

struct sdiv divide( unsigned long num, unsigned long divisor )
{
        struct sdiv x = { num / divisor, num % divisor };
        return x;
}

$ gcc -O3 -std=c99 -Wall -Wextra -pedantic -S divmod.c -o -

        .file   "divmod.c"
        .text
        .p2align 4,,15
.globl divide
        .type   divide, @function
divide:
.LFB0:
        .cfi_startproc
        movq    %rdi, %rax
        xorl    %edx, %edx
        divq    %rsi
        ret
        .cfi_endproc
.LFE0:
        .size   divide, .-divide
        .ident  "GCC: (GNU) 4.4.4 20100630 (Red Hat 4.4.4-10)"
        .section        .note.GNU-stack,"",@progbits

Yeah, you are right. Turns out I had too much beer and forgot to turn on optimizations. Thank you. I've upvoted everyone. — , Apr 09 '11 at 23:41

score 4 · Accepted Answer · answered Apr 09 '11 at 23:23

4

Yes -- a divl will produce the quotient in eax and the remainder in edx. Using Intel syntax, for example:

mov eax, 17
mov ebx, 3
xor edx, edx
div ebx
; eax = 5
; edx = 2

answered Apr 09 '11 at 23:23

Jerry Coffin

476,176
80
629
1,111

Jerry, it is funny because `gcc` doesn't even try to optimize two of these operations into one. I'll buy you a beer if you could give me gcc assembly so that I can use it as inline function or something... – Apr 09 '11 at 23:31
False alarm. Sorry. GCC does it for me if I don't forget to specify required flags. – Apr 09 '11 at 23:42
1

This answer is pretty off topic, as the question is specific to GCC and you answer it with assembly in Intel syntax Charles and raylu are all around better. – Evan Carroll Mar 05 '18 at 04:51
@EvanCarroll:What he asked for was a sequence of assembly instructions that would produce both the quotient and the remainder using only a single div instruction. Neither of those "answers" even attempts to provide that. They're both really comments, not answers. The only other actual answer here is DOSBoots'. – Jerry Coffin Mar 05 '18 at 07:14
No that's not what he's asking, read the question: *"I was trying to come up with **inline assembly for gcc** to get both division and modulus using single `divl` instruction."* – Evan Carroll Mar 05 '18 at 08:01
@EvanCarroll: Yes, "asssembly". Neither of the answers you seem to like show how to do the task in assembly language *at all*. They just advocate using C++. That's clearly not an even vaguely similar to answering the question he asked. – Jerry Coffin Mar 05 '18 at 14:20

score 2 · Answer 5 · answered Dec 09 '19 at 15:02

Here is an example in linux kernel code about divl

    /*
 * do_div() is NOT a C function. It wants to return
 * two values (the quotient and the remainder), but
 * since that doesn't work very well in C, what it
 * does is:
 *
 * - modifies the 64-bit dividend _in_place_
 * - returns the 32-bit remainder
 *
 * This ends up being the most efficient "calling
 * convention" on x86.
 */
#define do_div(n, base)                     \
({                              \
    unsigned long __upper, __low, __high, __mod, __base;    \
    __base = (base);                    \
    if (__builtin_constant_p(__base) && is_power_of_2(__base)) { \
        __mod = n & (__base - 1);           \
        n >>= ilog2(__base);                \
    } else {                        \
        asm("" : "=a" (__low), "=d" (__high) : "A" (n));\
        __upper = __high;               \
        if (__high) {                   \
            __upper = __high % (__base);        \
            __high = __high / (__base);     \
        }                       \
        asm("divl %2" : "=a" (__low), "=d" (__mod)  \
            : "rm" (__base), "0" (__low), "1" (__upper));   \
        asm("" : "=A" (n) : "a" (__low), "d" (__high)); \
    }                           \
    __mod;                          \
})

Division and modulus using single divl instruction (i386, amd64)

5 Answers5

Linked