3

The standard library vector::size() gives a size_t, an unsigned number. In one of the CppCon talks I have heard somebody (was it Chandler Carruth?) say that this is unfortunate and that it should rather use signed integers.

The background is that the overflow is not defined for signed integers, therefore the compiler has much more leeway. In a talk Carruth showed how a uint8_t as a for loop index in bzip2 creates many more machine instructions on x86 than int8_t because it has to explicitly simulate the overflow with masks and shifts.

In the code that I now work on, there are certain sizes which are strictly positive. These are represented as size_t. This seems decent because this shows that they cannot be negative. On the other hand, there is no need for the defined modular arithmetic, so as long as the signed integer is large enough (we go to like 200), the unsigned integer would have the wrong interface for the arithmetics that we want.

At some point in the code there are loops from 0 to this size. And then the loop indices are subtracted and the absolute value is taken.

When I compiled it with my more modern GCC 7, it could not resolve the proper overload of std::abs because size_t - size_t gives ambigious values, apparently. I have changed the code to use int in the loop indices:

for (int t1 = 0; t1 < Lt; t1++) {
  for (int t2 = 0; t2 < Lt; t2++) {

Now the abs(t1 - t2) works just fine. But the comparison t1 < Lt gives a warning because it is a comparison between signed and unsigned numbers.

What is the right approach?

  1. Use unsigned integers for everthing that is non-negative and then use static_cast<int>() whenever I need to do a substraction.
  2. Use signed integers for loop indices but unsigned integers for the sizes of the containers. Then use static_cast<int> in the comparisons.
  3. Just use signed integers everywhere. When other libraries return unsigned integers, just use static_cast<int> there in order to satisfy the warnings.
Martin Ueding
  • 8,245
  • 6
  • 46
  • 92
  • "*This seems decent because this shows that they cannot be negative*" But are they supposed to follow modular arithmetic? – juanchopanza Nov 14 '17 at 10:36
  • 3
    Use `ssize_t` (note the extra `s`) rather than `int` if you want a signed value. – Paul R Nov 14 '17 at 10:37
  • 1
    [This](https://stackoverflow.com/questions/10040884/signed-vs-unsigned-integers-for-lengths-counts) might be interesting. – Jabberwocky Nov 14 '17 at 10:38
  • 1
    @Martin Ueding All these considerations do not make sense. Just ignore them. If some entity can not ne negative then use an appropriate unsigned type. – Vlad from Moscow Nov 14 '17 at 10:39
  • Code generation issues are probably related to the use of non-natives word. Since `size_t` usually has size of a native word, it won't force compiler to generate extra instructions. –  Nov 14 '17 at 10:43
  • I use approach 3., and I'm happy with it. I use signed integers everywhere, except bit manipulations. I recommend you to do this too. (In my case, it is easier, as I don't use `std`, so I have my own containers, with signed sizes). With this approach, you can forget about signed/unsigned issues altogether. – geza Nov 14 '17 at 10:45
  • 2
    @VladfromMoscow: yes. and then do bugs like `for (unsigned int i=0; i – geza Nov 14 '17 at 10:48
  • I don't really see how this can be answered in any useful manner. The answer will obviously be _it depends on your use case_ or _benchmark an see_. – Passer By Nov 14 '17 at 10:53
  • 1
    @juanchopanza: I thought about this, but did not write it. There is no modular arithmetic involved, and after Carruths talk I consider unsigned integers to be for modular arithmethic and not for non-negative values. – Martin Ueding Nov 14 '17 at 11:03
  • This code needs several CPU hours doing dense tensor contractions. The loops are no cost at all. – Martin Ueding Nov 14 '17 at 11:05
  • If it makes sense that an integral value representing a size be negative, it may make sense to use a `signed` value. If it doesn't make sense for a size to be negative, then it may make sense to use an `unsigned` value. It's an implementation decision. Either way, if you need to subtract sizes, you need to work out how to correctly handle that. Signed integral values give undefined behaviour on overflow (e.g. subtracting a large positive value from a large negative one). Unsigned integral values use modulo arithmetic - which is well-defined, but may not give the result you intend. – Peter Nov 14 '17 at 11:22

1 Answers1

2

"In a talk Carruth showed how a uint8_t as a for loop index in bzip2 creates many more machine instructions on x86 than int8_t because it has to explicitly simulate the overflow with masks and shifts."

Well, if you can use either type, the for-range must be limited to [0, 127]. Just use int as the index type, then. It is by definition the natural type for basic math operations, and typically maps well to CPU registers.

Using types optimized for minimal storage will not generate the fastest math, no. That is not a surprise. You can't draw conclusions about signed versus unsigned based on such flawed setups.

"size_t - size_t gives ambiguous values"

Well, it doesn't, but it does use modular arithmetic. size_t(1)-size_t(2)==size_t(-1), but size_t(-1) is the largest possible value. This follows directly from the definition of modular math: x-1 < x, except when x-1 wraps around because x==0. (Or equivalently x+1>x except when x+1==0)

Calling abs(size_t(x)) is therefore also pointless since every size_t value is positive. And comparing signed integers against size_t is equally fraught with unintended consequences. Explicit casts are good, as they make the consequences clear.

But there is no universal solution to automatically figure out which cast should be applied. If a mechanical rule could be invented, we could have left that rule to the compiler. We haven't, because we can't. You, as a programmer will have to consider each case numerically.

MSalters
  • 173,980
  • 10
  • 155
  • 350