4

I have been reading through the Optimizing C++ wikibook. In the faster operations chapter one of the advice is as follows:

Integer division by a constant

When you divide an integer (that is known to be positive or zero) by a constant, convert the integer to unsigned.

If s is a signed integer, u is an unsigned integer, and C is a constant integer expression (positive or negative), the operation s / C is slower than u / C, and s % C is slower than u % C. This is most significant when C is a power of two, but in all cases, the sign must be taken into account during division.

The conversion from signed to unsigned, however, is free of charge, as it is only a reinterpretation of the same bits. Therefore, if s is a signed integer that you know to be positive or zero, you can speed up its division using the following (equivalent) expressions: (unsigned)s / C and (unsigned)s % C.

I tested this statement with gcc and the u / C expression seems to perform consistently better than the s / c

The following example is also provided below:

#include <iostream>
#include <chrono>
#include <cstdlib>
#include <vector>
#include <numeric>

using namespace std;

int main(int argc, char *argv[])
{

    constexpr int vsize = 1e6;
    std::vector<int> x(vsize);
    std::iota(std::begin(x), std::end(x), 0); //0 is the starting number

    constexpr int a = 5;

  auto start_signed = std::chrono::system_clock::now();
  int sum_signed = 0;
    for ([[gnu::unused]] auto  i : x)
    {
        // signed is by default
        int v = rand() % 30 + 1985;   // v in the range 1985-2014

        sum_signed += v / a;
    }
  auto end_signed = std::chrono::system_clock::now();

  auto start_unsigned = std::chrono::system_clock::now();
  int sum_unsigned = 0;
    for ([[gnu::unused]] auto  i : x)
    {
        int v = rand() % 30 + 1985;   // v in the range 1985-2014
        sum_unsigned += static_cast<unsigned int>(v) / a;
    }
  auto end_unsigned = std::chrono::system_clock::now();

  // signed
  std::chrono::duration<double> diff_signed = end_signed - start_signed;
  std::cout << "sum_signed: " << sum_signed << std::endl;
  std::cout << "Time it took SIGNED: " << diff_signed.count() * 1000 << "ms" << std::endl;

  // unsigned
  std::chrono::duration<double> diff_unsigned = end_unsigned - start_unsigned;
  std::cout << "sum_unsigned: " << sum_unsigned << std::endl;
  std::cout << "Time it took UNSIGNED: " << diff_unsigned.count() * 1000 << "ms" << std::endl;

  return 0;
}

You can compile and run the example here: http://cpp.sh/8kie3

Why is this happening?

phuclv
  • 37,963
  • 15
  • 156
  • 475
bergercookie
  • 2,542
  • 1
  • 30
  • 38
  • 6
    Unless used in a very tight inner loop that's dominating your runtime CPU use, micro optimizations like this are rarely worth the effort. And if they make the code harder to read I'd argue that they actually have negative gain regardless of some trivial performance increase. – Jesper Juhl Mar 25 '18 at 15:37
  • 2
    True, but how can the compiler be making this kind of optimization? – bergercookie Mar 25 '18 at 15:50
  • If I'm not mistaken, division in two's complement doesn't work the same way as unsigned division, meaning it either needs a more complex instruction on the CPU, or extra instructions to handle the sign bit. Casting to unsigned gets rid of the sign bit so the division is trivial. – patatahooligan Mar 25 '18 at 15:58
  • 5
    Look at the disassembly. It could be quite interesting. – n. m. could be an AI Mar 25 '18 at 16:05
  • As @n.m. points out, it is quite interesting. See [here](https://godbolt.org/g/EPJQmL) – Joseph Wood Mar 25 '18 at 16:11
  • 2
    @JesperJuhl, integer division is actually one of the most expensive arithmetic operations and many people have spent a lot of time figuring out how to make this faster. There is an awesome library called [libdivide](http://libdivide.com/) that specifically targets this topic. – Joseph Wood Mar 25 '18 at 16:25
  • 2
    @Joseph Wood - sure. But unless it's an actual performance bottleneck in your code I'd still argue that it's not worth worrying about. – Jesper Juhl Mar 25 '18 at 16:37
  • The test is timing mostly `rand()`. – n. m. could be an AI Mar 25 '18 at 19:47
  • If you increase the number of iterations to e.g., 1e8 you take that factor out.. then systematically the unsigned version is better – bergercookie Mar 25 '18 at 19:52
  • I once found a nice page online that explained how division by constant is optimized, but can't find it anymore. Anyway, main idea is to use fixed-point arithmetic and replace the `div` with a `mult` and a right shift. The case for signed int is slightly more complicated because right shift does not play well with negative numbers in 2-complement. – sbabbi Mar 25 '18 at 20:59
  • [Is there a difference in term of performance between “unsigned int” and “int”](https://stackoverflow.com/q/410982/995714) – phuclv Apr 16 '18 at 15:44

1 Answers1

6

After some toying around, I believe I've tracked down the source of the problem to be the guarantee by the standard that negative integer divisions are rounded towards zero since C++11. For the simplest case, which is division by two, check out the following code and the corresponding assembly (godbolt link).

constexpr int c = 2;

int signed_div(int in){
    return in/c;
}

int unsigned_div(unsigned in){
    return in/c;
}

Assembly:

signed_div(int):
  mov eax, edi
  shr eax, 31
  add eax, edi
  sar eax
  ret

unsigned_div(unsigned int):
  mov eax, edi
  shr eax
  ret

What do these extra instructions accomplish? shr eax, 31 (right shift by 31) just isolates the sign bit, meaning that if input is non-negative, eax == 0, otherwise eax == 1. Then the input is added to eax. In other words, these two instructions translate to "if input is negative, add 1 to it. The implications of the addition are the following (only for negative input).

  • If input is even, its least significant bit is set to 1, but the shift discards it. The output is not affected by this operation.

  • If input is odd, its least significant bit was already 1 so the addition causes a remainder to propagate to the rest of the digits. When the right shift occurs, the least significant bit is discarded and the output is greater by one than the output we'd have if we hadn't added the sign bit to the input. Because by default right-shift in two's complement rounds towards negative infinity, the output now is the result of the same division but rounded towards zero.

In short, even negative numbers aren't affected, and odd numbers are now rounded towards zero instead of towards negative infinity.

For non-power-of-2 constants it gets a bit more complicated. Not all constants give the same output, but for a lot of them it looks similar to the following (godbolt link).

constexpr int c = 3;

int signed_div(int in){
    return in/c;
}

int unsigned_div(unsigned in){
    return in/c;
}

Assembly:

signed_div(int):
  mov eax, edi
  mov edx, 1431655766
  sar edi, 31
  imul edx
  mov eax, edx
  sub eax, edi
  ret
unsigned_div(unsigned int):
  mov eax, edi
  mov edx, -1431655765
  mul edx
  mov eax, edx
  shr eax
  ret

We don't care about the change of the constant in the assembly output, because it does not affect execution time. Assuming that mul and imul take the same amount of time (which I don't know for sure but hopefully someone more knowledgeable than me can find a source on it), the signed version once again takes longer because it has extra instructions to handle the sign bit for negative operands.


Notes

  • Compilation was done on godbot using x86-64 GCC 7.3 with the -O2 flag.

  • Rounds towards zero behavior is standard-mandated since C++11. Before it was implementation defined, according to this cppreference page.

patatahooligan
  • 3,111
  • 1
  • 18
  • 27
  • "negative integer divisions are rounded towards zero since C++11" Could you please add some reference for this? – bergercookie Mar 25 '18 at 22:34
  • Did you see the note at the bottom? I linked to cppreference because it is known to be a reliable site and the official standard is not free; you have to link to a draft and it's usually less readable. Anyway, in the latest available C++17 draft ([here](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4659.pdf)), in paragraph 8.6 (p131) it confirms this behavior, though it calls it truncating to zero. – patatahooligan Mar 26 '18 at 00:37
  • Ah, missed it. Thanks – bergercookie Mar 26 '18 at 09:40