30

When I was working on string::npos I noticed something and I couldn't find any explanation for it on the web.

(string::npos == ULONG_MAX)

and

(string::npos == -1)

are true.

So I tried this:

(18446744073709551615 == -1)

which is also true.

How can it be possible? Is it because of binary conversation?

Evan Carroll
  • 78,363
  • 46
  • 261
  • 468
Bahadır
  • 454
  • 1
  • 4
  • 8
  • 7
    overflow :p you compare a unsigned and a signed value – Stargateur Nov 15 '16 at 10:59
  • 10
    This is not undefined behaviour. – rubenvb Nov 15 '16 at 11:02
  • 6
    18446744073709551615 = 2^64 -1 ... spooky coincidence? – lelloman Nov 15 '16 at 11:02
  • @George it is not UB. – Edgar Rokjān Nov 15 '16 at 11:02
  • 9
    It is _implementation defined_ behavior. The question is not really correct; it's not always true that (18446744073709551615 == -1) is true. – davmac Nov 15 '16 at 11:05
  • @davmac I don't think that's true, I think the result of unsigned overflow is defined. can you source that? https://stackoverflow.com/questions/5416414/signed-unsigned-comparisons#comment10174850_5416724 – Evan Carroll Jul 08 '18 at 18:37
  • Also @rubenvb ^ – Evan Carroll Jul 08 '18 at 18:39
  • @Bahadır check out my answer for more information =) – Evan Carroll Jul 08 '18 at 18:39
  • @EvanCarroll Hmm. `18446744073709551615` is a decimal literal, without a suffix. Its type should be `int`, `long int` or `long long int` or a _signed_ "extended integer type" (see Integer Literals [lex.icon]). Since "A program is ill-formed if one of its translation units contains an integer literal that cannot be represented by any of the allowed types", a program containing this expression is actually ill-formed unless it's `long long int` is large enough, in which case the expression should be `false` and not `true`. Typically compilers apply an unsigned type, hence implementation defined. – davmac Jul 08 '18 at 19:46
  • @EvanCarroll OTOH `18446744073709551615u == -1` is not ill-formed assuming that `unsigned long long` (or a suitable extended type) is large enough. It will only be `true`, though, if the size of the type it has is 64 bits - hence still implementation defined. – davmac Jul 08 '18 at 19:49
  • @davmac It's platform/arch-dependent, the spec defines the behavior. As are many operations in C and C++. The spec defines that assuming an unsigned long and signed long are 64bits the result should be `true` because unsigned overflow is allowed to occur. – Evan Carroll Jul 08 '18 at 19:53
  • @EvanCarroll some context has been lost here since an earlier comment (which seems to have been deleted) was claiming it was undefined behaviour, my comment was just to say that's not necessarily the case. Maybe you're right and "implementation defined" is the wrong terminology, but the point stands that the true/false value of the comparison depends on implementation details (the size of various int types). Also as I noted just above, the expression is _really_ either ill-formed or `false`, it can't ever properly be `true` - but various real compilers allow it, with a `true` value. – davmac Jul 08 '18 at 20:08

4 Answers4

39

18,446,744,073,709,551,615

This number mentioned, 18,446,744,073,709,551,615, is actually 2^64 − 1. The important thing here is that 2^64-1 is essentially 0-based 2^64. The first digit of an unsigned integer is 0, not 1. So if the maximum value is 1, it has two possible values: 0, or 1 (2).

Let's look at 2^64 - 1 in 64bit binary, all the bits are on.

1111111111111111111111111111111111111111111111111111111111111111b

The -1

Let's look at +1 in 64bit binary.

0000000000000000000000000000000000000000000000000000000000000001b

To make it negative in One's Complement (OCP) we invert the bits.

1111111111111111111111111111111111111111111111111111111111111110b

Computers seldom use OCP, they use Two's Complement (TCP). To get TCP, you add one to OCP.

1111111111111111111111111111111111111111111111111111111111111110b (-1 in OCP)
+                                                              1b (1)
-----------------------------------------------------------------
1111111111111111111111111111111111111111111111111111111111111111b (-1 in TCP)

"But, wait" you ask, if in Twos Complement -1 is,

1111111111111111111111111111111111111111111111111111111111111111b

And, if in binary 2^64 - 1 is

1111111111111111111111111111111111111111111111111111111111111111b

Then they're equal! And, that's what you're seeing. You're comparing a signed 64 bit integer to an unsigned 64bit integer. In C++ that means convert the signed value to unsigned, which the compiler does.

Update

For a technical correction thanks to davmac in the comments, the conversion from -1 which is signed to an unsigned type of the same size is actually specified in the language, and not a function of the architecture. That all said, you may find the answer above useful for understanding the arch/languages that support two's compliment but lack the spec to ensure results you can depend on.

Cyrille MASCART
  • 173
  • 2
  • 7
Evan Carroll
  • 78,363
  • 46
  • 261
  • 468
  • 1
    Strictly speaking the bit representations of the two numbers before conversion being the same doesn't matter. Even with 1's complement or signed magnitude representations, the conversion of (signed) -1 to `unsigned long` will always result in `ULONG_MAX`. (The bit pattern will be the same _after_ conversion of course). – davmac Jul 08 '18 at 20:27
  • All the ones and zeroes in this answer are implementation defined (except the -1's). In C++ it also doesn't matter how these are represented in binary at all. – rubenvb Jul 09 '18 at 14:29
  • @davmac how come? ULONG_MAX==2^32-1 while ~1==2^32 -2. On a one's complement system -1==~1==(ULONG_MAX-1). – Red.Wave Feb 19 '19 at 11:40
  • @Red.Wave ~1==(ULONG_MAX-1) isn't correct. They may have the same representation with a one's complement representation but do not have the same value. Conversion of -1 to `unsigned long` is specified by the language to result in `ULONG_MAX`. Therefor if -1==~1 (as in one's complement) then ~1==ULONG_MAX, because that comparison applies _usual arithmetic conversions_ which converts -1 to `unsigned long` (which yields ULONG_MAX). – davmac Feb 19 '19 at 22:16
  • @davmac any citations plz? A nop conversion would not result in that equality. Actually ULONG_MAX=-0=~0, assuming nop coversion on ones complement system. Setting -0=-1 would result in skews in logics and difficulty in implementation that defeats the purpose of one's complement. – Red.Wave Feb 19 '19 at 22:30
  • 1
    @Red.Wave assuming nop conversion isn't correct. C99 6.3.1.3, "Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type" – davmac Feb 19 '19 at 22:35
  • @Red.Wave or, since this question is tagged C++ and not C, see https://stackoverflow.com/questions/2711522/what-happens-if-i-assign-a-negative-value-to-an-unsigned-variable – davmac Feb 20 '19 at 00:34
9

string::npos is defined as constexpr static std::string::size_type string::npos = -1; (or if it's defined inside the class definition that would be constexpr static size_type npos = -1; but that's really irrelevant).

The wraparound of negative numbers converted to unsigned types (std::string::size_type is basically std::size_t, which is unsigned) is perfectly well-defined by the Standard. -1 wraps to the largest representable value of the unsigned type, which in your case is 18446744073709551615. Note that the exact value is implementation-defined because the size of std::size_t is implementation-defined (but capable of holding the size of the largest possible array on the system in question).

rubenvb
  • 74,642
  • 33
  • 187
  • 332
2

According to the C++ Standard (Document Number: N3337 or Document Number: N4296) std::string::npos is defined the following way

static const size_type npos = -1;

where std::string::size_type is some unsigned integer type. So there is nothing wonderful that std::string::npos is equal to -1. The initializer is converted to the tyhpe of std::string::npos.

As for this equation

(string::npos == ULONG_MAX) is true,

then it means that the type std::string::npos has type in the used implementation unsigned long. This type is usually corresponds to the type size_t.

In this equation

(18446744073709551615 == -1)

The left literal has some unsigned integral type that is appropriate to store such a big literal. Thus the right operand is converted also to this unsigned type by propogating the sign bit. As the left operand represents itself the maximum value of the type then they are equal.

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
  • According to the C++ language standard (at least C++11 as per http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3690.pdf section 2.14.2) the left literal must have a _signed_ type, not an _unsigned_ type. Compilers which translate it to an unsigned type in cases where there is no suitable signed type appear to be doing so as an extension to the language. – davmac Feb 20 '19 at 00:43
0

This is all about signed overflow and the fact that negative numbers are stored as 2s complement. The means that to get the absolute value of a negative number, you invert all the bits and add one. Meaning when doing an 8 bit comparison 255 and -1 have the same binary value of 11111111. The same applies to bigger integers

https://en.m.wikipedia.org/wiki/Two%27s_complement

doron
  • 27,972
  • 12
  • 65
  • 103