2

I remember reading somewhere a while back that if you are dividing by a constant there is a way to turn it into a multiplication due to the properties of modular arithmetic. I have been working on the following function and it involves performing many divisions with the same divisor. I was wondering if I could speed it up using what I described above. If so how exactly?

int ubi_div_i64(ubigint_t* a, ubi_i64_t b, ubi_i64_t* rem)
{
    if(b == 0)
        return UBI_MATH_ERR;

    ubi_i64_t r = 0;

    for(size_t i = a->used; i-- > 0;)
    {

        ubi_i64_t out;
        __asm__("\t"
                "div %[d] \n\t"
                : "=a"(out), "=d"(r)
                : "a"(a->data[i]), "d"(r), [d]"r"(b)
                : "cc");
        a->data[i] = out;
    }
    if(rem)
        *rem = r;

    return ubi_strip_leading_zeros(a);
}

Note: The divisor is 64 bit and the dividend is 128 bit.

rcgldr
  • 27,407
  • 3
  • 36
  • 61
chasep255
  • 11,745
  • 8
  • 58
  • 115
  • I think this might have the answer for you http://stackoverflow.com/a/4891307/174614 – Tony The Lion Sep 12 '15 at 22:38
  • 2
    Please pick one of C and C++. Also, if I recall correctly this trick is not going to be an improvement for this form of division as it needs a multiplication as large as the dividend. – fuz Sep 12 '15 at 22:39
  • What is the expected range of divident and divisor? – huseyin tugrul buyukisik Sep 12 '15 at 22:47
  • And state your assembly language. – Lightness Races in Orbit Sep 12 '15 at 22:52
  • The numerator will be a 64 bit type plus whatever the remainder of the last operation was. So it needs to be 128 bit. The divisor is 64 bit. – chasep255 Sep 12 '15 at 22:53
  • 9
    Before you try to optimize with hard to understand/maintain code, you should have a look at the assembler your compiler generates. You might be surprised how good modern compiler optimize your code. Note that this only works if you use a wider type for the multiplication, than for the division. This because it is effectively a fixed-point multiplication with `1/dividend`. If you also need the remainder, it might very well take longer than a simple CPU-`DIV` instruction (depends on the architecture and CPU model, of course). – too honest for this site Sep 12 '15 at 22:53
  • I guess it wont work because the only reason I am even using assembly right now is to support 128 bit division. – chasep255 Sep 12 '15 at 22:55
  • You only just touch on modular arithmetic. The accepted answer here might be relevant, with a special case where the modulus is a prime number. http://stackoverflow.com/questions/12235110/modulo-of-division-oftwo-numbers – Weather Vane Sep 12 '15 at 23:10
  • 6
    I share Olaf's opinion, but make sure you enable full optimisation. Please pick none of C or C++ because this question is actually about intel x86 assembly. The assembly just so happens to be embedded into C or C++ code using a gcc/clang extension (which this question is also not about). – autistic Sep 12 '15 at 23:37
  • Seb, I don't think that's true, it's about C or C++ and you can do it in a way that the C or C++ standard guarantees will work as long as you use a type like `uint32_t` or similar. – Chris Beck Sep 13 '15 at 01:42
  • I guess he's indicating he wants to do it in assembly though. – Chris Beck Sep 13 '15 at 01:50
  • Don't guess! OP, please tag your question correctly and specify your intent so we can give an accurate answer. – fuz Sep 13 '15 at 07:52
  • 1
    Olaf explains one way of doing this. You effectively do a fixed point multiplication where 1/a = 0.xxxxxxx . The problem is that you're dealing with a 128 bit dividend. If the divisor is small enough, then the quotient can have more than 64 bits, in which case the fixed point multiply could require extended precision math. – rcgldr Sep 13 '15 at 08:08
  • 2
    I agree with Olaf etc. Are you sure that it is worthwhile optimising this part of the code. Perhaps better performance can be achieved elsewhere with less effort – Ed Heal Sep 13 '15 at 08:22

0 Answers0