19

When comparing a unsigned value, as in this test:

if (pos == (size_t)-1)

Is this comparison technically different from something like:

if (pos == (size_t)~0)

I am not used to the second variant. That's why I am asking the question. The answer may be rather straighforward if it's yes.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
yves Baumes
  • 8,836
  • 7
  • 45
  • 74
  • 6
    Well `-1` converted to unsigned is [always guaranteed to be UMAX](http://stackoverflow.com/questions/22801069/using-1-as-a-flag-value-for-unsigned-size-t-types/22801135#22801135) I don't think we can say that about `~0` b/c it depends on the underlying representation. – Shafik Yaghmour Jul 09 '14 at 09:33
  • The (straightforward) answer is no (there is no difference). – barak manos Jul 09 '14 at 11:40
  • 7
    @barakmanos: Then straightforward means wrong? – Deduplicator Jul 09 '14 at 12:12
  • Not an exact duplicate, but very closely related to http://stackoverflow.com/questions/809227/is-it-safe-to-use-1-to-set-all-bits-to-true – Adrian McCarthy Jul 09 '14 at 17:52

3 Answers3

27

The C++ standard guarantees that size_t is an unsigned type, that unsigned types obey the usual modular arithmetic rules (where the modulus is two to the number of bits in the value representation of the type, cf. 3.9/4), and thus -1 converted to size_t must be the largest value which that type can represent.

The value 0 is an int, and ~0 has all the bits in the int representation of zero flipped. The value of that result depends on the representation of int on your platform. That value (which may be a trap representation, thanks @Matt McNabb) is then converted to size_t (which is done following the rules of modular arithmetic).

In conclusion, whether the resulting values compare equal is implementation defined. (For example, if int is represented in two's complement, then the value of ~0 is -1, so the two are the same.)

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • 3
    @KerrekSB You say "depends on the representation of `int` on your platform". Out of interest, what if the inversion happens after the cast i.e. `~((size_t)0)`. Would this remove the dependency on the implementation of the `int`? I.e. would there be stronger guarantees that `(size_t)-1` and `~((size_t)0)` would be equivalent? (Typo fixes from my last comment) – Niall Jul 09 '14 at 12:49
  • @Niall: I believe so, though I haven't all the pieces together just now. The `~` operator is defined as one's complement, and the value of an unsigned integer is the binary value of its value representation (this is the part I'm missing), so it follows that one's complement of zero is "all ones", which is the maximal representable value. – Kerrek SB Jul 09 '14 at 12:59
  • I think it may be required by the standard. From the C++ draft; section 3.9.1, paragraph 7; "The representations of integral types shall define values by use of a pure binary numeration system. [ Example: this International Standard permits 2’s complement, 1’s complement and signed magnitude representations for integral types. —end example ]" +1 I find this an interesting question/answer pair. – Niall Jul 09 '14 at 14:12
  • @Niall: It's certainly hard to see how it could be anything else, especially if you combine 3.9.1/4, 3.9.1/7 and footnote 52. Basically, an unsigned integer has N value bits, and the value of the bits determines the value of the integer in the usual way. It's really only the signed integers that leave any room for implementation details. (And of course the fact that integers needn't be uniquely represented; they may well have padding.) – Kerrek SB Jul 09 '14 at 14:31
  • 5
    Note that in one's complement, `~0` may be a trap representation (negative zero) – M.M Jul 10 '14 at 00:23
19

Assuming (guaranteed by the standard) that size_t refers to an unsigned integer value, this:

if(pos == (size_t)~0)

used with the intent to be equivalent to:

if(pos == (size_t)-1)

is assuming that the machine uses a 2's complement representation for negative integers. The standard doesn't enforce it so you shouldn't assume it if you want your code to be 100% portable.

Community
  • 1
  • 1
Marco A.
  • 43,032
  • 26
  • 132
  • 246
1

So, in your example technically there is no difference whatsoever. Because it is hard to find a compiler that will not optimize operations on literals like -1 and ~0. With your example I've got exactly:

        ; ...
        movq    $-1, -16(%rbp)
        movq    $-1, -8(%rbp)
        ; ...

Don't be afraid of those -1's, assebmly is typeless ;)

More interesting question is if your example would be:

#include <stddef.h>
int main() {
        int var0 = 0;
        int var1 = 1;
        size_t a = (size_t) -var1;
        size_t b = (size_t) ~var0;
        return a ^ b;
}

In my case (Kubuntu, gcc 4.8.2, x86_64, -O0 option) the part of interest was:

        movl    $0, -24(%rbp)    ; var0 = 0
        movl    $1, -20(%rbp)    ; var1 = 1

        movl    -20(%rbp), %eax
        negl    %eax             ; 2's complement negation

        ; ...

        movl    -24(%rbp), %eax
        notl    %eax             ; 1's complement negation

        ; ...

Looking into Intel's manual:

NEG - Two's Complement Negation

Replaces the value of operand (the destination operand) with its two's complement. (This operation is equivalent to subtracting the operand from 0.)

NOT - One's Complement Negation

Performs a bitwise NOT operation (each 1 is set to 0, and each 0 is set to 1) on the destination operand and stores the result in the destination operand location.

My conclusion would be, theoretically, code could differ on some exotic platform and compiler, but otherwise – no. And always if unsure check assembly listing on your platform.

hurufu
  • 545
  • 7
  • 10