mod (%) in AVR assembly - __divmodhi4

Question

I'm trying to do %10 in AVR assembly.

I created a simple c file

int main()
{
  int k=19;
  int j;
  j = k%10;
  return 0;
}

which I then compiled into assembly, giving

    ldi r24,lo8(19)
    ldi r25,0
    std Y+2,r25
    std Y+1,r24
    ldd r24,Y+1
    ldd r25,Y+2
    ldi r18,lo8(10)
    ldi r19,0
    mov r22,r18
    mov r23,r19
    rcall __divmodhi4
    std Y+4,r25
    std Y+3,r24
    ldi r24,0
    ldi r25,0

How does __divmodhi4 work and where are the results stored?

For division by a compile-time constant, it's normally more efficient to multiply by a fixed-point inverse and shift the high half. GCC only chooses to do that when hardware division is wide enough to do that in one multiply instruction, though. See https://godbolt.org/z/jOV_b4 for the asm for `unsigned short foo(unsigned short k) { return k%10;}` on AVR and x86-64 GCC, with `-O3`. (unsigned division is simpler than signed division, because `-11 % 10 = -1` for example, so it takes extra shifts to get the sign handling right.) Generally look at asm for functions that take args and return. — Peter Cordes, Mar 16 '19 at 17:31
@PeterCordes you're correct as your XKCD cartoon suggests. Removed. :) — TomServo, Aug 27 '21 at 02:25

score 3 · Accepted Answer · answered Mar 16 '19 at 14:53

Since AVR does not have hardware divider, the AVR-GCC compiler has to use complex functions to perform such operations.

__divmodhi4 - one of those functions. It divides signed 16 bit integer, stored in r25:r24, by another signed 16 bit integer in r23:r22.

Returns 16-bit quotient in r23:r22 and remainder in r25:r24

You should see __divmodhi4 in the same disassembly where you see your own code.

also you can see find sources of the GCC library for example, here

thomasrutter · Answer 2 · 2021-08-25T03:45:37.587

In order to understand for myself how this function worked, I wrote a version of it in c.

(if you have the means to step through AVR assembler on your development machine, then this may be unnecessary)

Here is a somewhat direct translation:

    uint16_t udivmodhi4(uint16_t arg1, uint16_t arg2) {
        
        uint16_t rem = 0;
        
        uint8_t i = 16;
        uint8_t carry = 0;
        uint8_t carry2 = 0;
        
        do {
            carry2 = (arg1 & 0x8000) != 0;
            arg1 = (arg1 << 1) + carry;
            i--;
        
            rem = (rem << 1) + carry2;
            carry = arg2 > rem;
            if (!carry) {
                 rem = rem - arg2;
            }
        }
        while (i);
        
        arg1 = (arg1 << 1) + carry;
        
        arg1 = arg1 ^ 0xffff;
    
        // arg1 has the quotient, rem has the remainder
        return arg1;
        //return rem;
    }

And here is my cleaned up version:

uint16_t udivmodhi4(uint16_t arg1, uint16_t arg2) {
    uint16_t rem = 0;

    for (uint8_t i = 0; i < 16; i++) {
        rem = (rem << 1) | (arg1 & 0x8000 ? 1 : 0);
        arg1 = arg1 << 1;
        if (rem >= arg2) {
            rem -= arg2;
            arg1 |= 1;
        }
    }
    return arg1;
    //return rem;
}

As you can see it loops 16 times*, and within each loop it takes the highest bit from arg1, shifts it into the lowest bit of the remainder, compares the remainder arg2, and shifts that back onto arg1, subtracting arg2 from the remainder if necessary.

*:Note that the ASM sets the loop variable to 17 at the start, but decrements it before starting the loop, so it loops 16 times. Also, the ASM version inverts the bits going back onto arg1 and then flips them at the end. Most oddities like this in the code appear to be for optimising code size.

The c code is not going to optimise down to as few instructions like the ASM, and I only did it for learning purposes. Bottom line is, this does a 16 bit unsigned divide of any dividend and divisor in a loop of 16.

Related: njuffa has an answer on [How can I multiply and divide using only bit shifting and adding?](https://stackoverflow.com/a/32443307) showing bit-at-a-time division by shifting in pure C, and separately with inline asm for x86. Looks quite similar to your cleaned up version, but the initializer for `rem` doesn't involve `(arg1 & 0x8000 ? 1 : 0)`. (Which you could simplify to `arg1 >> 15`, although IDK if either way is more likely to help GCC see that it should shift a bit from the top of arg1 into the bottom of rem via `adc rem,rem`) — Peter Cordes, Aug 25 '21 at 04:29
Ah thanks for that link, I'd have saved myself the trouble if I'd noticed that. Neat algorithm. — thomasrutter, Aug 25 '21 at 13:32
The compiler looks like it optimizes all variants of (arg1 >> 15) or (arg1 >= 0x8000) or (arg1 & 0x8000 ? 1 : 0) equally, which is to say, not very well. It's not smart enough to translate it into a shift with carry. Looks like ASM is basically required for this to be an optimized solution. Still, it's a good learning opportunity. — thomasrutter, Aug 26 '21 at 22:21

mod (%) in AVR assembly - __divmodhi4

2 Answers2