In order to understand for myself how this function worked, I wrote a version of it in c.
(if you have the means to step through AVR assembler on your development machine, then this may be unnecessary)
Here is a somewhat direct translation:
uint16_t udivmodhi4(uint16_t arg1, uint16_t arg2) {
uint16_t rem = 0;
uint8_t i = 16;
uint8_t carry = 0;
uint8_t carry2 = 0;
do {
carry2 = (arg1 & 0x8000) != 0;
arg1 = (arg1 << 1) + carry;
i--;
rem = (rem << 1) + carry2;
carry = arg2 > rem;
if (!carry) {
rem = rem - arg2;
}
}
while (i);
arg1 = (arg1 << 1) + carry;
arg1 = arg1 ^ 0xffff;
// arg1 has the quotient, rem has the remainder
return arg1;
//return rem;
}
And here is my cleaned up version:
uint16_t udivmodhi4(uint16_t arg1, uint16_t arg2) {
uint16_t rem = 0;
for (uint8_t i = 0; i < 16; i++) {
rem = (rem << 1) | (arg1 & 0x8000 ? 1 : 0);
arg1 = arg1 << 1;
if (rem >= arg2) {
rem -= arg2;
arg1 |= 1;
}
}
return arg1;
//return rem;
}
As you can see it loops 16 times*, and within each loop it takes the highest bit from arg1, shifts it into the lowest bit of the remainder, compares the remainder arg2, and shifts that back onto arg1, subtracting arg2 from the remainder if necessary.
*:Note that the ASM sets the loop variable to 17 at the start, but decrements it before starting the loop, so it loops 16 times. Also, the ASM version inverts the bits going back onto arg1 and then flips them at the end. Most oddities like this in the code appear to be for optimising code size.
The c code is not going to optimise down to as few instructions like the ASM, and I only did it for learning purposes. Bottom line is, this does a 16 bit unsigned divide of any dividend and divisor in a loop of 16.