5

I am a beginner in C . I have recently learned about 2's Complement and other ways to represent negative number and why 2's complement was the most appropriate one.

What i want to ask is for example,

int a = -3;
unsigned int b = -3; //This is the interesting Part.

Now , for the conversion of int type

The standard says:

6.3.1.3 Signed and unsigned integers

When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.

Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.

The first paragraph can't be used as -3 can't be represented by unsigned int.

Therefore paragraph 2 comes to play and we need to know the maximum value for unsigned int. It can be found as UINT_MAX in limits.h. The maximum value in this case is 4294967295 so the calculation is:

-3 + UINT_MAX + 1 = -3 + 4294967295 + 1 = 4294967293  

Now 4294967293 in binary is 11111111 11111111 11111111 11111101 and -3 in 2's Complement form is 11111111 11111111 11111111 11111101 so they are essentially same bit representation , it would be always same no matter what negative integer i am trying to assign to unsigned int.So isn't unsigned type redundant.

Now i know that printf("%d" , b) is an undefined behavior according to standard, but isn't that what is a reasonable and more intuitive way to do things. As representation will be same if negative are represented as 2's Complement and that is what we have now , and other ways used are rare and most probably will not be in future developments.

So if we could have only one type say int , now if int x = -1 then %d checks for the sign bit and print negative number if sign bit is 1 and %ualways interpret the plain binary digit (bits) as it is . Addition and subtraction are already dealt with because of using 2's complement. So isn't this more intuitive and less complex way to do things.

  • 2
    The standard simply describes the behaviour that 99.9% of modern CPUs expose when interpreting an unsigned number as a signed. So, yes, most machines do what the standard says. But signed and unsigned are *not redundant* because of that. – tofro Dec 31 '16 at 11:28
  • @tofro Can you elaborate a little bit , i do not understand you. –  Dec 31 '16 at 11:31
  • 2
    E.g. for comparisons it is crucial to know, if the number has to be interpreted as signed or unsigned. – Ctx Dec 31 '16 at 12:15

4 Answers4

7

It's handy to have both for input, output and computation. For example, comparison and division come in signed and unsigned varieties (btw, at the bit level multiplication is the same for unsigned and 2's complement signed types, just like addition and subtraction and both may compile into the same multiplication instruction of the CPU). Further, unsigned operations do not cause undefined behavior in case of overflow (except for division by zero), while signed operations do. Overall, unsigned arithmetic is well defined and unsigned types have a single representation (unlike three different ones for signed types, although, these days in practice there's just one).

There's an interesting twist. Modern C/C++ compilers exploit the fact that signed overflows result in undefined behavior. The logic is that it never happens and therefore some additional optimizations can be done. If it actually happens, the standard says it's undefined behavior, and your buggy program is legally screwed. What this means is that you should avoid signed overflows and all other forms of UB. However, sometimes you can carefully write code that never results in UB, but is a bit more efficient with signed arithmetic than with unsigned.

Please study the undefined, the unspecified and the implementation-defined behaviors. They are all listed at the end of the standard in one of the annexes (J?).

Alexey Frunze
  • 61,140
  • 12
  • 83
  • 180
  • Also See This http://stackoverflow.com/questions/41406570/how-is-int-type-changed-to-signed-and-unsigned-type-and-what-is-the-rationale-be –  Dec 31 '16 at 13:03
  • which book should i start with to know all of this ..? –  Jan 02 '17 at 06:36
  • @Stranger [The 1999 C standard](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf) is the best (most detailed). Easier to read but a bit dated is [The C Programming Language 2nd Edition](https://www.amazon.com/Programming-Language-Brian-W-Kernighan/dp/0131103628). Additional reading: [The Development of the C Language by Dennis M. Ritchie](https://www.bell-labs.com/usr/dmr/www/chist.html), [Rationale for C99](http://www.open-std.org/jtc1/sc22/wg14/www/docs/C99RationaleV5.10.pdf), [The New C Standard: An Economic and Cultural Commentary](http://www.knosof.co.uk/cbook/cbook.html). – Alexey Frunze Jan 02 '17 at 08:32
  • @Stranger I would not award you a prize for it, but it has some merit. :) – Alexey Frunze Jan 02 '17 at 09:57
  • I was asking to confirm i am heading on right direction , i have two acconu on this site , and this site is not beginer friendly , it is hard for a beginner because most people here just come to downvote a question , that is my experience –  Jan 02 '17 at 09:58
  • See This Question http://stackoverflow.com/questions/34826036/confused-about-pointer-dereferencing/41349954#34826630 Would You Consider This A Bad Question ? –  Jan 02 '17 at 10:00
  • @Stranger Way too many people fail to do basic research before asking questions. Often times questions are poorly stated. – Alexey Frunze Jan 02 '17 at 10:01
  • Was The Question I Provided link too a bad question , and what is the purpose of this site is to provide people with answer , this is a Q&A site and here they expect us to know everything , earlier a general purpose question like "Definite guide to C Books" were asked and were extraordinary usefull , but then something happened moderators are just for marking a question duplicate and closed. –  Jan 02 '17 at 10:03
  • 1
    @Stranger The thing is, if the question shows a serious knowledge gap, a good answer would likely be unreasonably big or incomprehensible and require many follow-up questions. Such a question is not good for SO. – Alexey Frunze Jan 02 '17 at 10:15
  • Note tht multiplication isn't the same for signed and unsigned types. Example: 4-bit-by-4-bit multiplier, with `0001` applied to one input, and `1000` applied to the other input. An unsigned multiplier has to produce `00001000`, while a 2's complement multiplier has to produce `11111000`. – EML Jun 07 '18 at 17:28
  • @EML C/C++ doesn't produce products with more bits than multiplicands. int \* int = int. unsigned long \* unsigned long = unsigned long. If you use types smaller than int (like char or short), they get converted to int (possibly unsigned) before multiplication, so you're back at int \* int = int after the conversion. Likewise mixing int and long converts the int to a long and the multiplication is long \* long = long. And if you mix signed and unsigned you still arrive at typeX \* typeX = typeX. So, if integers are 2's complement, the multiplication is the same for signed and unsigned in C/C++ – Alexey Frunze Jun 10 '18 at 08:14
  • hmm.. C (let's just say C) does produce products with more bits than multiplicands; it just doesn't return them to you (unless you explicitly take steps to double your word size). The result is wrong if the inputs can't be represented in half the word size. Additionally, for an *n*x*n* multiply, both 2'sC and unsigned multiply hardware (ie machine instructions) produce the same result in the bottom *n* bits of the 2x*n*-bit result, so in some limited sense 'signed' and 'unsigned' multiplies are the 'same' in C. If you know what you're doing. – EML Jun 10 '18 at 11:00
  • @EML Write a function that takes two ints and returns their product. Write another one with unsigned ints. Feed them into your favorite x86 compiler and observe the identical machine code generated for both. – Alexey Frunze Jun 11 '18 at 11:10
  • I'm not quite sure what your point is. Are you saying that plain binary and 2'sC multiplication *are* the same at the hardware level? Or that you don't need *n* + *m* bits to represent the result of an *n* * *m* multiply? Both of these are hardware and assembler 101. If you have two C test programs that produces identical assembler, then they're not testing the relevant cases, or they're using incorrect word widths, or they're producing incorrect results. – EML Jun 12 '18 at 10:48
  • 1
    @EML I'm pretty sure the point is that in your example of multiplying `0001` by `1000`, C is very clear that the result should be `1000`, *not* `xxxx1000`. That the hardware multiplier produces more bits for the product is irrelevant because from the perspective of the C standard, those bits don't exist. And it is because the top half of the bits is basically thrown away that C permits an implementation to treat signed and unsigned multiplication as the same thing (so long as the compiler effectively follows all of the "standard arithmetic conversion" rules beforehand). – mtraceur Jan 03 '19 at 18:50
  • @mtraceur - AF's answer says that "btw, at the bit level multiplication is the same for unsigned and 2's complement signed types, just like addition and subtraction". My comment was the polite way of saying "this is wrong". At "the bit level", they are very different. At the bit level, addition and subtraction are identical; the difference is in interpretation. This is not true of multiplication. The point of my example was to demonstrate this. I said nothing about C; that was AF's interpretation, rather than admitting that bit-level multiplication differs. – EML Jan 05 '19 at 01:48
  • @EML I asked you to make [this experiment](https://godbolt.org/z/NiTf7F). What do you say about it? – Alexey Frunze Jan 05 '19 at 05:52
3

My answer is more abstract, in my opinion in C you should not care about representation of integer in memory. The C abstract this to you, and this is very good.

Declare an integer as unsigned is very useful. That assumes that the value will never be negative. Like floating number handle real number, signed integer handle... integer and unsigned integer handle natural number.

When you create algorithm where negative integer would lead to undefined behavior. You can be sure that your unsigned integer value will never be negative. For example, when you iterate over index of an array. A negative index would lead to undefined behavior.

An other thing is when you create a public API, when one of your function require a size, a length, a weight or whatever that will don't make sense in negative. This helps the user to understand the purpose of this value.


In the other hand, some people disagree because the arithmetic of unsigned doesn't work as people first expect. Because when an unsigned is decremented when is equal to zero, it will pass to a very big value. Some people expect that he will be equal to -1. For example:

// wrong
for (size_t i = n - 1; i >= 0; i--) {
  // important stuff
}

This produces an infinite loop or even worse if n equal zero, the compiler will probably detect it but not all time:

// wrong
size_t min = 0;
for (size_t i = n - 1; i >= min; i--) {
  // important stuff
}

Do this with unsigned integer requires a little trick:

size_t i = n;
while (i-- > 0) {
  // important stuff
}

In my opinion, it's very important to have unsigned integer in a language and C would not be complete without.

Stargateur
  • 24,473
  • 8
  • 65
  • 91
  • You don't care about type representation until you need to talk to the outer world or until you need a quick way of doing something the language doesn't support directly, while the hardware does. – Alexey Frunze Dec 31 '16 at 15:51
2

I think a major reason is operators and operations depends on the signed-ness.

You've observed add/subtract behaves the same for signed and unsigned types, if signed types uses 2's compliment (and you've been ignoring the fact that this "if" sometimes is not the case.)

There are numerous cases where the compiler needs the signed-ness information to understand the purpose of the program.

1. Integer promotion.

When a narrower type is converted to a wider type, the compiler will generate the code depending the operands' types.

E.g. if you convert signed short to signed int and int is wider than short, the compiler would generate code that does the conversion, and that conversion is different from "unsigned short" to "signed int" (sign extension or not).

2. Arithmetic right shift

-1>>1 can be still -1 if the implementation choose to, but 0xffffffffu>>1 must be 0x7fffffffu

3. Integer division

Similarly, -1/2 is 0, 0xffffffffu/2 is 0x7fffffffu

4. 32bit multiply by 32bit, with 64 bit result:

This is a little hard to explain, so let me use code instead.

#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>

int main(void) {
    // your code goes here
    int32_t a=-1;
    int32_t b=-1;
    int64_t c = (int64_t)a * b;
    printf("signed: 0x%016"PRIx64"\n", (uint64_t)c);

    uint32_t d=(uint32_t)-1;
    uint32_t e=(uint32_t)-1;
    uint64_t f = (uint64_t)d * e;
    printf("unsigned: 0x%016"PRIx64"\n", f);

    return 0;
}

Demo: http://ideone.com/k30nZ9

5. And of course, comparison.


One can design a signed-ness-less language, but then a lot of operators needs to split into two or more versions so that the programmer can express the purpose of the program, e.g. operator / needs to be split into udiv and sdiv, operator * need to be split into umul and smul, integer promotion needs to be explicit, operator > needs to be scmpgt/ucmpgt.........

That would be a horrible language to use, isn't it?


Bonus: All pointers usually have the same bit representation but have different operator [], ->, *, ++, --, +, -.

user3528438
  • 2,737
  • 2
  • 23
  • 42
0

Well the easiest and general answer is memory maintenance, every variable in C language reserves some memory space in main memory (RAM) when we declare it, for example: unsigned int var; will reserve 2 or 4 bytes and will range from 0 to 65,535 or 0 to 4,294,967,295.

While signed int will have range from -32,768 to 32,767 or -2,147,483,648 to 2,147,483,647.

The point is sometime you just positive numbers which can't be negative for example your age obviously it can't be negative so you would use 'unsigned int'. Similarly when dealing with numbers those can contain negative numbers of the same range as signed int we will than use it. In short a good programming practice is to use appropriate data types according to our need so we can use computer memory effectively and our programs will be more compact.

As far as i know about 2's complement its all about the specific data type or to more specific the right base. We simply cannot determine either it's a 2's complement of a specific number or not. But since computer deals with binary we still have number of bytes in our way for example 2's complement of 7 in 8 bit would be different than in 32 bit and 64 bit.