Speeding up integer division with constant divisor

Question

I remember reading somewhere a while back that if you are dividing by a constant there is a way to turn it into a multiplication due to the properties of modular arithmetic. I have been working on the following function and it involves performing many divisions with the same divisor. I was wondering if I could speed it up using what I described above. If so how exactly?

int ubi_div_i64(ubigint_t* a, ubi_i64_t b, ubi_i64_t* rem)
{
    if(b == 0)
        return UBI_MATH_ERR;

    ubi_i64_t r = 0;

    for(size_t i = a->used; i-- > 0;)
    {

        ubi_i64_t out;
        __asm__("\t"
                "div %[d] \n\t"
                : "=a"(out), "=d"(r)
                : "a"(a->data[i]), "d"(r), [d]"r"(b)
                : "cc");
        a->data[i] = out;
    }
    if(rem)
        *rem = r;

    return ubi_strip_leading_zeros(a);
}

Note: The divisor is 64 bit and the dividend is 128 bit.

I think this might have the answer for you http://stackoverflow.com/a/4891307/174614 — Tony The Lion, Sep 12 '15 at 22:38
Please pick one of C and C++. Also, if I recall correctly this trick is not going to be an improvement for this form of division as it needs a multiplication as large as the dividend. — fuz, Sep 12 '15 at 22:39
The numerator will be a 64 bit type plus whatever the remainder of the last operation was. So it needs to be 128 bit. The divisor is 64 bit. — chasep255, Sep 12 '15 at 22:53
Before you try to optimize with hard to understand/maintain code, you should have a look at the assembler your compiler generates. You might be surprised how good modern compiler optimize your code. Note that this only works if you use a wider type for the multiplication, than for the division. This because it is effectively a fixed-point multiplication with `1/dividend`. If you also need the remainder, it might very well take longer than a simple CPU-`DIV` instruction (depends on the architecture and CPU model, of course). — too honest for this site, Sep 12 '15 at 22:53
I guess it wont work because the only reason I am even using assembly right now is to support 128 bit division. — chasep255, Sep 12 '15 at 22:55
You only just touch on modular arithmetic. The accepted answer here might be relevant, with a special case where the modulus is a prime number. http://stackoverflow.com/questions/12235110/modulo-of-division-oftwo-numbers — Weather Vane, Sep 12 '15 at 23:10
I share Olaf's opinion, but make sure you enable full optimisation. Please pick none of C or C++ because this question is actually about intel x86 assembly. The assembly just so happens to be embedded into C or C++ code using a gcc/clang extension (which this question is also not about). — autistic, Sep 12 '15 at 23:37
Seb, I don't think that's true, it's about C or C++ and you can do it in a way that the C or C++ standard guarantees will work as long as you use a type like `uint32_t` or similar. — Chris Beck, Sep 13 '15 at 01:42
I guess he's indicating he wants to do it in assembly though. — Chris Beck, Sep 13 '15 at 01:50
Don't guess! OP, please tag your question correctly and specify your intent so we can give an accurate answer. — fuz, Sep 13 '15 at 07:52
Olaf explains one way of doing this. You effectively do a fixed point multiplication where 1/a = 0.xxxxxxx . The problem is that you're dealing with a 128 bit dividend. If the divisor is small enough, then the quotient can have more than 64 bits, in which case the fixed point multiply could require extended precision math. — rcgldr, Sep 13 '15 at 08:08
I agree with Olaf etc. Are you sure that it is worthwhile optimising this part of the code. Perhaps better performance can be achieved elsewhere with less effort — Ed Heal, Sep 13 '15 at 08:22

Speeding up integer division with constant divisor

0 Answers0