38

In C or C++ it is said that the maximum number a size_t (an unsigned int data type) can hold is the same as casting -1 to that data type. for example see Invalid Value for size_t

Why?

I mean, (talking about 32 bit ints) AFAIK the most significant bit holds the sign in a signed data type (that is, bit 0x80000000 to form a negative number). then, 1 is 0x00000001.. 0x7FFFFFFFF is the greatest positive number a int data type can hold.

Then, AFAIK the binary representation of -1 int should be 0x80000001 (perhaps I'm wrong). why/how this binary value is converted to anything completely different (0xFFFFFFFF) when casting ints to unsigned?? or.. how is it possible to form a binary -1 out of 0xFFFFFFFF?

I have no doubt that in C: ((unsigned int)-1) == 0xFFFFFFFF or ((int)0xFFFFFFFF) == -1 is equally true than 1 + 1 == 2, I'm just wondering why.

Community
  • 1
  • 1
royconejo
  • 2,213
  • 3
  • 24
  • 29
  • 17
    Read about "Two's complement" on Wikipedia; this is the most common way of encoding negative numbers in binary. – Artelius Dec 07 '09 at 21:50
  • 7
    http://en.wikipedia.org/wiki/Two%27s_complement –  Dec 07 '09 at 21:50
  • 1
    You'll notice that, just as with unsigned numbers, adding 1 to the highest possible number will give you the lowest possible number. – Drew Dormann Dec 08 '09 at 03:04
  • 1
    The binary representation of negative one has to be that representation that yields zero when one is added to it. This is `0xFFFFFFFF`. – David Schwartz Jul 24 '15 at 20:21

6 Answers6

52

C and C++ can run on many different architectures, and machine types. Consequently, they can have different representations of numbers: Two's complement, and Ones' complement being the most common. In general you should not rely on a particular representation in your program.

For unsigned integer types (size_t being one of those), the C standard (and the C++ standard too, I think) specifies precise overflow rules. In short, if SIZE_MAX is the maximum value of the type size_t, then the expression

(size_t) (SIZE_MAX + 1)

is guaranteed to be 0, and therefore, you can be sure that (size_t) -1 is equal to SIZE_MAX. The same holds true for other unsigned types.

Note that the above holds true:

  • for all unsigned types,
  • even if the underlying machine doesn't represent numbers in Two's complement. In this case, the compiler has to make sure the identity holds true.

Also, the above means that you can't rely on specific representations for signed types.

Edit: In order to answer some of the comments:

Let's say we have a code snippet like:

int i = -1;
long j = i;

There is a type conversion in the assignment to j. Assuming that int and long have different sizes (most [all?] 64-bit systems), the bit-patterns at memory locations for i and j are going to be different, because they have different sizes. The compiler makes sure that the values of i and j are -1.

Similarly, when we do:

size_t s = (size_t) -1

There is a type conversion going on. The -1 is of type int. It has a bit-pattern, but that is irrelevant for this example because when the conversion to size_t takes place due to the cast, the compiler will translate the value according to the rules for the type (size_t in this case). Thus, even if int and size_t have different sizes, the standard guarantees that the value stored in s above will be the maximum value that size_t can take.

If we do:

long j = LONG_MAX;
int i = j;

If LONG_MAX is greater than INT_MAX, then the value in i is implementation-defined (C89, section 3.2.1.2).

Alok Singhal
  • 93,253
  • 21
  • 125
  • 158
  • 6
    Voted up because you're the first person who noted that `(size_t)-1` is because of the arithmetic rules that C specifies for unsigned numbers, not because of the underlying representation. (By the way, `SIZE_MAX` is the macro). – caf Dec 07 '09 at 22:06
  • Thanks! I didn't want to use `SIZE_MAX` because it's not in C89, and also because I was trying to make a general point. Still, I think I could have mentioned it. – Alok Singhal Dec 07 '09 at 22:11
  • "even if the underlying machine doesn't represent numbers in two's complement" - given that two's complement is a way of representing _signed_ numbers, how would it ever apply to unsigned integers? – Pavel Minaev Dec 07 '09 at 22:31
  • If the standard guarantees that SIZE_T_MAX+1 == 0, does that automatically translate to (size_t)-1 == SIZE_T_MAX? I'm not sure it does, especially if the compiler needs to emit special code to check for that case. I'm not even sure I would guarantee (size_t) 0 - (size_t) 1 == SIZE_T_MAX. Maybe the standard has more to say on the subject than you've implied, I don't have it to hand. – Mark Ransom Dec 07 '09 at 23:12
  • 3
    Mark: "Unsigned integers shall obey the laws of arithmetic modulo 2**n where n is the number of bits in the value representation of that particular size of integer." [3.9.1/4, C++03] –  Dec 07 '09 at 23:21
  • Edited my post for clarification. Also see Roger's reply. – Alok Singhal Dec 07 '09 at 23:33
  • @Roger, thanks for that. Seems to say (size_t)0 - (size_t)1 will give the desired answer. Still not convinced that it applies to (size_t)-1 however, since -1 is outside the range handled by size_t. P.S. should we argue about angels on the head of a pin after this? – Mark Ransom Dec 07 '09 at 23:47
  • 1
    @Alok, your edit better answered my question. To say it in simple words, no matter the internal representation of a binary number; it is irrelevant for C and its integer arithmetic rules. In conclusion, given that there are different hardware representations for negative integer, there is no way to manipulate them at the bit level in C to "make" negative numbers out of their bits, is this correct? – royconejo Dec 08 '09 at 05:11
  • @conejoroy: It's possible to do it, but you have to be extremely careful, and the solution is most likely not going to be portable. First, you need to know the size of the underlying object (easy, sizeof will do that). You also need to know the endianness of the machine (also fairly easy, although there are some gotchas - and there might be systems with very "weird" endianness). Then, your compiler may add padding bits and/or have trap representations (bit-patterns that don't make any sense). So, it's best to not do it. – Alok Singhal Dec 08 '09 at 10:06
  • +1 for the clear answer. Liking how you stress the *value* aspect. – Johannes Schaub - litb Dec 08 '09 at 20:52
  • 1
    regarding "most(all?) 32 bit systems", I would say "very few 32 bit systems". Typically `int` and `long` are both 32-bit on those systems. You would get a mismatch on 16-bit systems (16/32), or some 64-bit systems (32/64). – M.M Jul 24 '15 at 12:01
  • @MattMcNabb nice. I probably meant to say 64-bit instead of 32-bit. – Alok Singhal Jul 24 '15 at 13:34
32

It's called two's complement. To make a negative number, invert all the bits then add 1. So to convert 1 to -1, invert it to 0xFFFFFFFE, then add 1 to make 0xFFFFFFFF.

As to why it's done this way, Wikipedia says:

The two's-complement system has the advantage of not requiring that the addition and subtraction circuitry examine the signs of the operands to determine whether to add or subtract. This property makes the system both simpler to implement and capable of easily handling higher precision arithmetic.

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • 6
    P.S. I worked on a one's complement machine once. It was weird, having both positive zero and negative zero. – Mark Ransom Dec 07 '09 at 22:05
  • It never occurred to me that floating point has negative zero, but I see that you are right: http://en.wikipedia.org/wiki/Signed_zero – Mark Ransom Dec 07 '09 at 22:26
  • The old Control Data Cyber series used one's complement. The machine language was odd: the standard comparison instruction would treat +0 as greater than -0, the standard equality instruction would treat them as unequal, but there was an "is it zero?" instruction that in essence returned true for -0 and +0 and false for anything else. Fortunately, I never had to deal much with that. Equally fortunately, I never had to do low-level text processing, since it fit 6-bit characters 10 to a machine word, and lowercase was handled by a mixed 6- and 12-bit representation. – David Thornley Dec 07 '09 at 22:32
  • 5
    The answer doesn't change even if you're on a ones' complement machine, or any other weird machine. For details, please see my answer. As a minor nit, it's "ones' complement", not "one's complement". From Knuth: A two's complement number is complemented with respect to a single power of 2, while a ones' complement number is complemented with respect to a long sequence of 1s. Indeed, there is also a "twos' complement notation," which has radix 3 and complementation with respect to (2...22)_3. – Alok Singhal Dec 07 '09 at 22:33
  • @David: Congratulations for guessing the machine where I first learned assembly. @Alok: I wasn't aware that the standard made guarantees like this, I've never used C or C++ on any machine that didn't use two's complement. I'm guessing they're extremely rare. – Mark Ransom Dec 07 '09 at 23:08
  • @Mark Ransom: Perhaps it's more accurate to say "make a number negative" or even "switch the sign of a number" than "make a negative number". As long as the highest bit is set, you've made a negative number. – Drew Dormann Dec 08 '09 at 02:46
  • @David: on the Control Data's, you could cheat a bit:if you added zero to a number, it would turn negative zero into positive zero, which made comparisons simple and straightforward. – Jerry Coffin Dec 08 '09 at 03:00
7

Your first question, about why (unsigned)-1 gives the largest possible unsigned value is only accidentally related to two's complement. The reason -1 cast to an unsigned type gives the largest value possible for that type is because the standard says the unsigned types "follow the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer."

Now, for 2's complement, the representation of the largest possible unsigned value and -1 happen to be the same -- but even if the hardware uses another representation (e.g. 1's complement or sign/magnitude), converting -1 to an unsigned type must still produce the largest possible value for that type.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
3

Two's complement is very nice for doing subtraction just like addition :)

    11111110 (254 or -2)
   +00000001 (  1)
   ---------
    11111111 (255 or -1)

    11111111 (255 or -1) 
   +00000001 ( 1)
   ---------
   100000000 ( 0 + 256)
pmg
  • 106,608
  • 13
  • 126
  • 198
1

That is two's complement encoding.

The main bonus is that you get the same encoding whether you are using an unsigned or signed int. If you subtract 1 from 0 the integer simply wraps around. Therefore 1 less than 0 is 0xFFFFFFFF.

Goz
  • 61,365
  • 24
  • 124
  • 204
-2

Because the bit pattern for an int -1 is FFFFFFFF in hexadecimal unsigned. 11111111111111111111111111111111 binary unsigned. But in int the first bit signifies whether it is negative. But in unsigned int the first bit is just extra number because a unsigned int cannot be negative. So the extra bit makes an unsigned int able to store bigger numbers. As with an unsigned int 11111111111111111111111111111111 (binary) or FFFFFFFF (hexadecimal) is the biggest number a uint can store. Unsigned Ints are not recommended because if they go negative then it overflows and goes to the biggest number.

trinalbadger587
  • 1,905
  • 1
  • 18
  • 36
  • 3
    You've just restated OP's observation and he is asking *why* the bit pattern for an int is that; and also *why* (unsigned int)-1 gives `0xFFFFFFFF` (which your answer does not answer clearly). Also, that is controversial advice to not recommend using unsigned ints. They are very commonly used. Both signed ints and unsigned ints have pitfalls. My opinion is that the unsigned ints have fewer pitfalls than the signed ints, so I prefer to use them unless I know that negative values are required. – M.M Jul 24 '15 at 12:07