26

I am learning C and have a dumb question regarding the "-1" in the value range for unsigned int and signed int. I can't seem to find an explanation for it anywhere.

The paragraph below explains the data range. However, it does not explain the "-1". What does "-1" represent/mean? Is it -1 because it skips 0 and 0 has no value?

In 32-bit integers, an unsigned integer has a range of 0 to 2^32 -1 = 0 to 4,294,967,295 or about 4 billion. The signed version goes from -2^31 -1 to 2^31, which is –2,147,483,648 to 2,147,483,647 or about -2 billion to +2 billion. The range is the same, but it is shifted on the number line.

Boann
  • 48,794
  • 16
  • 117
  • 146
biscuit
  • 353
  • 3
  • 10
  • 12
    Actually there is an error in "The signed version goes from **-2^31 -1** to **2^31** [...]". It has to be "The signed version goes from **-2^31** to **2^31 -1** [...]" like the rest says: "which is –2,147,483,648 to 2,147,483,647". – the busybee Aug 29 '19 at 05:41
  • @Yunnosch Agreed, but then the second part of the statement is erroneous. ;-) Anyway, the 2s complement representation is the most used implementation AFAIK. – the busybee Aug 29 '19 at 05:44
  • 3
    @Yunnosch Non-2's complement 32-bit min value would be `-2^31 + 1`, not `-2^31 -1`. Perhaps the original `()` missing as in `-(2^31 -1)` – chux - Reinstate Monica Aug 29 '19 at 05:50
  • @chux You are right. I should stay quiet. ;-) – Yunnosch Aug 29 '19 at 05:53
  • 2
    It's -1 because 0 *is* represented, and leaves one less bit pattern available to represent nonzero numbers. – dan04 Aug 29 '19 at 13:36
  • 4
    That’s irritating to read. The `-` is actually a minus here. Write `range of 0 to 2^32 - 1` or, better, `range of 0 to 2³²-1` instead. – mirabilos Aug 29 '19 at 15:52
  • Related question on Super User: https://superuser.com/q/975684 – gronostaj Aug 29 '19 at 16:31
  • For unsigned arithmetic, since computers use a fixed sized structure to represent a number, the maximum number is bounded. Computers deal with overflow by using modulo M. For binary computers, the fixed sized structure will be some number of bits, N, and the modulo M will be 2^N. But, M mod M is 0, so the largest representable number is M-1. – jxh Aug 29 '19 at 18:44
  • 6
    It **literally** means to subtract 1 from the previous value. In the expression `2^32-1` it means, "calculate 2 to the 32nd power, then subtract 1." – Jim L. Aug 29 '19 at 19:40
  • 1
    What do you mean, "*What does "-1" represent/mean? Is it -1 because it skips 0 and 0 has no value?*" You learned negative numbers in grammar school. – RonJohn Aug 29 '19 at 20:25
  • The gist is that you can't have negative integers when using unsigned numbers. Asking what is -1 in signed integers is like asking "What is 1/2 as an integer value?" It just makes no sense. – MaxW Aug 29 '19 at 21:00
  • 2
    Where did you find that incorrect quote you're quoting? If it's on a wiki anywhere, it should get edited. – Peter Cordes Aug 30 '19 at 00:22
  • @RonJohn You seem like a pleasant guy. Math actually has nothing to do with grammar. I think you mean grade school. The point of the question is to understand why there is a -1. – biscuit Sep 04 '19 at 02:22
  • https://en.wikipedia.org/wiki/Grammar_school – Yunnosch Sep 04 '19 at 06:27
  • I know what they are. I was just returning the favor. – biscuit Sep 04 '19 at 21:04

5 Answers5

48

Consider the values you can achieve with 2 bits:

00 : 0
01 : 1
10 : 2
11 : 3

There are 4 of them, 2 to the power of 2.
But the highest value is not 4, it is 3.
The highest value is 2 to the power of 2 minus 1. I.e. in your representation

2^2-1
or 22-1

Add a bit and you get twice the number, by adding

100 : 4
101 : 5
110 : 6
111 : 7

Total number 8, but highest number 7.

So the "-1" is because always the first of the total of 2n is used for 0,
the 2nd is used for 1, the 3rd is used for 2.
In the end (2n)th one is not available for 2n, it is already used for 2n-1.

Yunnosch
  • 26,130
  • 9
  • 42
  • 54
26

n bits can represent 2n different values. (The first bit can have two values * the second bit can have two values * the third bit can have two values * ...)

For example, 3 bits can form 23 = 8 different bit patterns, and thus up to 8 different values.

000
001
010
011
100
101
110
111

If each bit pattern represents an integer, then an n-bit integer can represent 2n different integers. For example,

  • It could represent the integers from 0 to 2n-1 inclusively
    (because (2n-1) - (0) + 1 = 2n different values).

    For example,

      000   0
      001   1
      010   2
      011   3
      100   4
      101   5
      110   6
      111   7
    
  • It could represent the integers from -(2n-1) to 2n-1-1 inclusively
    (because (2n-1-1) - (-(2n-1)) + 1 = 2n different values).

    For example,

      100  -4
      101  -3
      110  -2
      111  -1
      000   0
      001   1
      010   2
      011   3
    

You could assign any meaning to these values, but the previously stated ranges are the ones understood by twos'-complement machines for unsigned integers and signed integers respectively.[1]


  1. On a ones'-complement machine, there are two ways of writing zero (0000...00002 and 1000...00002), so the range is only -(2n-1-1) to 2n-1-1. I think all modern machines are twos'-complement machines, though.
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • 1
    As a curiosity: the significand of `double, float` is sign-magnitude. So being familiar with that format helps there. – chux - Reinstate Monica Aug 29 '19 at 05:56
  • I learned about better formatting for exponents in your answer. Thanks. – Yunnosch Aug 29 '19 at 06:31
  • ikegami, Detail: IEEE FP uses [_sign magnitude_](https://en.wikipedia.org/wiki/Signed_number_representations#Signed_magnitude_representation_(SMR)) like encoding, not _ones' complement_. Also note `'` location. – chux - Reinstate Monica Aug 29 '19 at 12:51
  • And the _exponent_ in IEEE floating points [is encoded](https://en.wikipedia.org/wiki/Single-precision_floating-point_format) with an [offset](https://en.wikipedia.org/wiki/Offset_binary), where the all-zeroes pattern is the smallest possible value, and the pattern corresponding to zero is in the middle of the (raw unsigned binary) range. (Except that in IEEE FP, the all-zeroes pattern is special, so it's ...00001 that is the smallest exponent.) – ilkkachu Aug 29 '19 at 16:00
  • *2^n different values*. Not necessarily. It can hold 2^n different bit-patterns (in C: object representations). For one's complement or sign/magnitude encodings, there are two redundant encodings for the value `0`. (e.g. in IEEE floating point `-0.0` is usually considered the same value as `+0.0`, even though there are rules about sign-bit propagation for signed zeros...) IEEE 754 floats even spend a chunk of their bit-patterns on NaN encodings which don't represent any *value*. – Peter Cordes Aug 30 '19 at 00:18
  • Oh, I think that's just a terminology nitpick, using the word "values" to talk about bit-patterns / encodings / object representations. In C terminology, a *value* is the thing represented *by* the object-representation's bit pattern. e.g. `-1` represented by `0xFFFFFFFF` – Peter Cordes Aug 30 '19 at 00:22
  • @Peter Cordes, Re "*Not necessarily.*", Yes, necessarily. The fact that someone might choose to assign the same value to two different bit patterns doesn't mean it *can't* have 2^N different values; it just means it *doesn't* in that particular scheme. /// Re "*For one's complement*", That's already in my answer. /// Re "*Oh, I think that's just a terminology nitpick*", According to what you said, my use of "values" in in line with C terminology's meaning. "Negative one", "4 letter 'A'" and "39.5 oC" are values in the sense of "values" I used. – ikegami Aug 30 '19 at 00:49
  • I agree with the examples in the last sentence of your comment. But you use "values" in your answer to talk about bit-patterns. e.g. "*If each value represents an integer*". Values are what is represented. And your 2nd sentence in the first paragraph explains why there are 2^n bit-patterns right after saying 2^n values. I was assuming that the choice of encoding was also fixed so you might not have the option of choosing an encoding with no redundant or non-value encodings (e.g. trap representations of NaNs). `int` encoding is chosen by the C implementation, not the programmer. – Peter Cordes Aug 30 '19 at 01:06
  • @Peter Cordes, I overlooked that instance earlier. Fixed. – ikegami Aug 30 '19 at 01:10
  • @Peter Cordes, Re "*`int` encoding is chosen by the C implementation, not the programmer.*", Nope. You can assign whatever meaning to those bits that you want. I gave four examples above, and here's another: `enum Foo { FOO };` Of course, certain operators and functions will have certain expectations, and C decides how character and numeric literals are encoded, but that's something else. – ikegami Aug 30 '19 at 01:12
  • @ikegami: Are you suggesting that you could have an `enum` with 2^16 unique named values on a 16-bit one's complement machine? (I forget if C `enum` types grow in width if needed...) If you want manual bit-meaning, you should normally build that on top of `unsigned` integers as containers for raw bits. – Peter Cordes Aug 30 '19 at 01:16
  • Trap representations is something that I'm not very familiar with. I acknowledge there might be caveats. – ikegami Aug 30 '19 at 01:16
  • @Peter Cordes, Re "*Are you suggesting that you could have an enum with 2^16 unique named values on a 16-bit one's complement machine?*", No. I don't know if it can or not. I don't even know the max number of enums on a twos'-complement. – ikegami Aug 30 '19 at 01:17
  • A "trap representation" is just a bit-pattern that makes the machine raise an exception. e.g. I think I've read that some one's complement hardware disallowed the "negative zero" encoding of `0`. Creating it (e.g. with C type-punning from unsigned, or maybe even bitwise OR) and then using it as input to a signed-`add` asm instruction would raise an exception. C allows for HW like this by letting types have "trap representations". (And of course one's complement HW never produces that bit-pattern from signed math instructions. So it's normally unused, even if accepted as 0.) – Peter Cordes Aug 30 '19 at 01:20
  • https://en.cppreference.com/w/c/language/enum has some conflicting info. First of all "An enumerated type is a distinct type whose value is a value of its *underlying type*". And later "Each enumerated type is compatible with one of: char, a signed integer type, or an unsigned integer type.". But also "Each enumerator that appears in the body of an enumeration specifier becomes an integer constant with type `int`" seems to contradict that. But anyway, it's not a unique encoding separate from signed or unsigned types. Forward declarations aren't allowed, so width could vary by enum. – Peter Cordes Aug 30 '19 at 01:25
  • The point is you can assign any meaning you want to the different bit patterns. Nothing you're posting is contradicting that. – ikegami Sep 01 '19 at 09:08
  • If the type that C uses for the `enum` treats two bit patterns as both having the same value, you can't compare and use them in the normal way, only via memcmp and memcpy. Not with `switch` / `case`. But sure, if you want to do everything manually with memcpy and memcpy to use C as a portable assembler, then sure you can do whatever you want with any distinct bit patterns. – Peter Cordes Sep 15 '19 at 02:13
  • @PeterCordes, Look, I mentioned one's-complement machines is an exception in my answer before you even posted your first comment. I don't see what you were trying to accomplish weeks ago, and I don't know what you are trying to accomplish by starting again. – ikegami Sep 15 '19 at 02:16
  • (And a trap representation doesn't have to result in an exception, just UB, according to https://stackoverflow.com/questions/6725809/trap-representation) – ikegami Sep 15 '19 at 02:20
6

Adding to @Yunnosch's excellent explanation on unsigned numbers, almost all modern computers use "two's complement" to represent signed binary integers. In two's complement, the most significant bit is used as the "sign bit" and bits are the complement of absolute value of the number + 1. So for the 3 bit example, while the range for unsigned values is 0 to 7, the range for signed values is -4 to 3:

100 : -4
101 : -3
110 : -2
111 : -1
000 :  0
001 :  1
010 :  2
011 :  3

Notice that for signed numbers the range of negative numbers is one greater than the range of positive numbers. That's because, while in number theory, 0 is neither positive or negative, in binary representation, 0 has to be either negative or positive. Because it has the most significant bit cleared, 0 is part of the positive number domain, so that leaves one less positive number available.

daShier
  • 2,056
  • 2
  • 8
  • 14
  • Thank you @Yunnosch, I edited my post accordingly. It's been WAY too many years since I studied and I should have checked my work before posting. ;) – daShier Aug 29 '19 at 06:03
  • 3
    I am afraid the prhasing "the complement of absolute value of the number + 1" can be misread to describe -1 = ~(1+1) -> ~ 2 -> 101. The problem is the lack of "()" in prose, which allows to read it this way. Anybody who already knows the mechanisms reads your text correctly, so it is not wrong. But those who need explanation might get confused. – Yunnosch Aug 29 '19 at 06:05
  • Thanks for appreciating my input on the name. The prose phrasing is still tricky. (As you can see I have noticed my typo by now...) – Yunnosch Aug 29 '19 at 06:11
5

For an unsigned integer type, the value -1 is out of range and cannot be represented in a variable of that type. If you try to assign -1 to an unsigned int a conversion occurs according to the rules of the C standard.

The conversion of a signed value to an unsigned integer type is specified in section 6.3.1.3p2 of the C standard:

Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.60

...

60) The rules describe arithmetic on the mathematical value, not the value of a given type of expression

Assuming as in your example that unsigned int has a value range of 0 to 4,294,967,295 the value -1 is converted by adding -1 + 4,294,967,296 = 4,294,967,295. Note that this conversion happens regardless of how negative numbers are represented on the given system. It is the same for two's complment, ones' compliment, or sign-and-magnitude.

Note that this means that the representation of the converted value need not be the same as the representation of -1.

Using a 4-bit type as an example, converting the value -1 to an unsigned type results in the value 15. The representation of these numbers is as follows:

                sign-and magnitude    ones' complement   two's complement
  -1   (signed)               1001                1110               1111
  15 (unsigned)               1111                1111               1111

While in the two's complement case the result of the conversion keeps the same representation, it changes in the other two cases. For one's complement the representation of -1 is the same as for 14, and for sign-and-magnitude the representation of -1 is the same as for 9.

So what other answers have described regarding two's complement is most likely how those implementations do it (i.e. reinterpreting the representation of -1 as an unsigned value), however from the perspective of C language as an abstract machine what I've described is the only correct way of performing this conversion.

dbush
  • 205,898
  • 23
  • 218
  • 273
  • 2
    This is a comprehensive and instructive coverage of the question how -1 is converted into an unsigned value in C. This is what I thought the question was when I read the title. After reading the question body and comment thread I was unsure whether the OP perhaps meant the "-1" in "2^32-1". – Peter - Reinstate Monica Aug 29 '19 at 19:59
3

Where did you find this incorrect paragraph? It appears to be about 2's complement but has the -1 in the wrong place.

For C implementations using one's complement or sign/magnitude signed integers, the range is symmetric around zero (with 2 bit patterns that both represent 0, so the positive range and the negative range are the same size).

Basically nothing ever uses that these days, but the ISO C standard specifies that signed integers are binary and use either two's complement, one's complement, or sign/magnitude.


In 2's complement (nearly universal these days), the range of representable values using n bits is [- 2n-1 , 2n-1 - 1 ]. One bit-pattern (all bits zero) represents the value zero. Every bit has a place-value of 2^i, except the final one which has a place value of -2^(n-1).

The bit-pattern with all bits set represents -1 because sum(2^i, i=0..n-1) is one less than 2^n.

With only the sign bit set, we get the most-negative number: -INT_MIN is signed overflow (undefined behaviour) because it's not representable as an int; it requires a wider integer. Or with wrapping, -INT_MIN = INT_MIN. This is the "2's complement anomaly". https://en.wikipedia.org/wiki/Two%27s_complement#Most_negative_number

You can avoid widening if you're doing an absolute value operation: e.g.
unsigned abs = i >= 0 ? i : -(unsigned)i;

(Converting a negative value to unsigned in C has well-defined behaviour of modulo-reducing until it's in the representable range. In C this is independent of the signed-integer encoding; what matters is the value. So (uint8_t)-1 is always 255. For 2's complement it just copies the bit-pattern. For sign/magnitude or one's complement a C implementation would have to do some math to cast from signed to signed. Notice that I did this before negation, which means 0 - i with the usual unsigned wrapping.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847