1

I for one, find it more difficult dividing bigger numbers than smaller ones.

I was wondering if this would be the same for the CPU. Is there any hardware which allows smaller numbers to be divided faster than larger numbers.

e.g

100 / (uint8_t) 255
100 / (char) 255
100 / (int) 255
100 / (int) 2147483647
100 / (long) 2147483647
100 / (long) 9223372036854775807

I would imagine the process of division to require more steps for bigger numbers, thus require more instructions by the ALU. I am unsure whether data types would have any influence any on this?

Would there be any noticeable difference between the division of these numbers? Is it generally better to divide with smaller numbers if given the chance (is this any form of optimization)?

Some Dinosaur
  • 154
  • 1
  • 14
  • Yes, some operations may be slower on larger data types, especially on low powered CPUs that may not have dedicated hardware for all operations. – Joachim Isaksson Dec 22 '19 at 11:28
  • There are certain situations where the compiler is able to optimize operations such as multiplication and division. You can read more here: https://en.wikibooks.org/wiki/Optimizing_C%2B%2B/Code_optimization/Faster_operations#Operations_with_powers_of_two as well as this question https://stackoverflow.com/questions/6357038/is-multiplication-and-division-using-shift-operators-in-c-actually-faster – Soenderby Dec 22 '19 at 12:00

1 Answers1

0

It depends on the internal implementation. Consider the following naive binary long division:

struct div_t{
  int quot;
  int rem;
};

struct div_t div(int dividend, int divisor){
    _Bool dividend_is_negative = (dividend < 0),
        divisor_is_negative = (divisor < 0),
        result_is_negative = divisor_is_negative ^ dividend_is_negative;
    unsigned quotient =0, shift, shifted;
    //if (dividend_is_negative) dividend = -dividend;
    divisor ^= -divisor_is_negative;
    divisor += divisor_is_negative;
    //if (dividend_is_negative) dividend = -dividend;
    dividend ^= -dividend_is_negative;
    dividend += dividend_is_negative;
    shifted = divisor;
    //shift divisor so its MSB is same as dividend's - minimize loops
    //if no builtin clz, then shift divisor until its >= dividend
    //such as: while (shifted<dividend) shifted<<=1;
    shift = __builtin_clz(divisor)-__builtin_clz(dividend);
    //clamp shift to 0 to avoid undefined behavior
    shift &= -(shift > 0);
    shifted<<=shift;
    do{
        unsigned tmp;
        quotient <<=1;
        tmp = (unsigned long) (shifted <= dividend);
        quotient |= tmp;
        //if (tmp) dividend -= shifted;
      dividend -= shifted & -tmp;
        shifted >>=1;
    }while (shifted >= divisor);
    //if (result_is_negative) quotient=-quotient, dividend=-dividend;
    quotient ^= -result_is_negative;
    dividend ^= -result_is_negative;
    quotient += result_is_negative;
    dividend += result_is_negative;     
    return (struct div_t){quotient, dividend};
}

Obviously for smaller numbers the loop terminates sooner in the naive implementation.

If the whole loop were unrolled and parallelized in circuit it would use a lot of chip space since each bit has dependencies on the higher bits. This is often worth it for desktops, servers and super computers but not so much for mobile, embedded or IoT.

Many configurable chipsets (risc-v, j-core, etc) have options for fast vs slow division while some leave it out altogether and let the compiler do something like my example.

technosaurus
  • 7,676
  • 1
  • 30
  • 52