1

i have written a small program below.

#include <stdio.h>
main(){
    char a=-1;
    unsigned char b=-1;
    printf("%d %d\n",a,b);
    printf("%x %x\n",a,b);
    if(a==b) printf("equal\n");
    else printf("not equal\n");
}

The output of the prog is :

-1 255
ffffffff ff
not equal

since char is only one byte and -1 is represented in 2's complement form, i thought that 0xff will be stored in both a & b and hence both should be equal. Can anyone let me know why they are different and why hex rep'n of a is 0xffffffff & not 0xff. i got a related link http://embeddedgurus.com/stack-overflow/2009/08/a-tutorial-on-signed-and-unsigned-integers/ but i couldn't get the answer. any help will be greatly appreciated. thanks.

mah
  • 39,056
  • 9
  • 76
  • 93
mezda
  • 3,537
  • 6
  • 30
  • 37

4 Answers4

5

They are the same. Or rather, their underlying representation is the same (under the assumption that your compiler use two-complement form).

On the other hand, the values they represent are -1 and 255.

When you print them, they are extended to the data type int. unsigned char is zero-extended whereas a signed char is sign extended, which accounts for the differences you see.

The same extension occurs when you compare the two values. a == b don't compare the underlying representations, instead, it extends both values to int so it compares 255 with -1, which isn't equal.

Note that a plain char may be either signed or unsigned. In your environment, it is obviously signed.

Lindydancer
  • 25,428
  • 4
  • 49
  • 68
  • you told that char type is sign extended to int type in the comparison. i have two doubts : 1) Does such sign extension of char types to int types always happens in any expression ? 2) Will similar extension happen for short int to int types ? i never knew such thing in C, can you point some links also for the same. – mezda Jan 09 '13 at 12:40
  • In C, all operations *conceptually* occur in `int` (or larger, possibly unsigned, types). However, in most operations, the operations can be reduced to a smaller machine operation. Concretely, adding two char:s yield the same result regardless if the operation is performed in 8 bits or in 32 bits, if only the lower 8 bits are used. 2) Yes, this happens to all smaller integer types. A common "gotcha" in C is checking if an unsigned char has all bits set using `~ch == 0`. This is always false, since `ch` will be zero extended to an `int`. – Lindydancer Jan 09 '13 at 12:50
  • This doesn't seem correct. According to the rules of type promotion `unsigned char` is promoted to `unsigned int`, only `signed char` is promoted to `int` (see ISO C standard). And according to the rules of implicit type conversion, comparing `int` to `unsigned int` causes both values to be converted to `unsigned int`. So the correct answer should be: It extends both values to `unsigned int`, not `int`. – Mecki Jan 09 '13 at 13:48
  • 2
    §6.3.1.1 "If an 'int' can represent all values of the original type, the value is converted to an 'int'; otherwise, it is converted to an 'unsigned int'." Hence, `unsigned char` is promoted to an `int`, as all it's values, 0-255, can be represented by an 'int'. – Lindydancer Jan 09 '13 at 14:12
  • @Lindydancer : Thanks for the reply. Your answer differs from the Mecki's answer in that he is saying that both values will be converted to unsigned int (so that we will compare 4294967295 against 255) while you say that both values will be converted to int (so that we will compare -1 with 255). Can you please elaborate on what is correct here ? thanks. – mezda Jan 14 '13 at 19:01
  • 1
    The standard (which I quoted above) says that values that fit into an `int` (which both `signed char` and `unsigned char`:s do) will be promoted to an `int`. You can verify this by comparing the two using `>`, the result is that an unsigned 255 is larger than a signed -1. Had the operation been performed using `unsigned int` it would have been the other way around, as -1 would have been seen as a very large number. – Lindydancer Jan 14 '13 at 19:38
  • Hmm... I guess you are right, at least regarding the ISO C standard. That means my C book is actually wrong, since it claims unsigned is promoted to unsigned. – Mecki Jan 14 '13 at 19:45
  • @Mecki: Interesting, which book is that? – Lindydancer Jan 16 '13 at 09:30
  • "C Pocket Reference" from O'Reilly. Though I'm not sure I have the latest edition and also I have a translated edition, thus maybe the translation is incorrect or misleading (the translator may have misunderstood the English original). – Mecki Jan 17 '13 at 16:05
2

The char type is something of an anomaly in that it is not the same as either signed char or unsigned char (unlike the other integer types - short, int, long, etc - which are implicitly signed unless explicitly declared unsigned). Whether char is actually signed or not is implementation-dependent, and some compilers even let you specify the signedness via a command line switch.

Bottom line: never assume that char is signed or unsigned - if you actually require a signed or unsigned 8 bit quantity then use signed char or unsigned char explicitly, or better still, use int8_t or uint8_t from <stdint.h>.

Paul R
  • 208,748
  • 37
  • 389
  • 560
  • Clearly, in the OP's implementation, `char` is signed, otherwise there would be no question. This answer bring an important bit of information, but does not answer the original question. – Pascal Cuoq Jan 09 '13 at 12:52
  • @Pascal: true - I was (wrongly) assuming that the OP understood signed versus unsigned integer behaviour and was confused about the nature of an unqualified char variable. – Paul R Jan 09 '13 at 13:15
2

A signed int is signed, an unsigned int is unsigned. If you use just int, it implies signed int. Same is true for short, long or long long. Yet it isn't true for char. A signed char is signed, an unsigned char is unsigned, but just char may be either signed or unsigned. The data type char is supposed to hold a "character", hence the name, so it's not "really" an integer type to hold an integer number to be used in calculations. Of course a character is in reality an integer of some kind but of which kind is implementation dependent (the C standard does not force any specific kind). So if you want to use the char type for integer values (also used in calculations), always use signed char or unsigned char explicitly and only use just char when you are really dealing with characters or when it makes absolutely no difference for your code if char is signed or unsigned.

The comparison fails because your implementation defines char to be in fact signed char, so you are comparing a signed char to an unsigned char in your final if statement. Whenever you are comparing two integers of different type, the compiler converts both values to the same type according to the rules of the C standard before it actually performs the comparison. In your case, this means the C compiler actually does tho following:

if((int)a==(int)b) printf("equal\n");
    else printf("not equal\n");
}

And now it should be obvious why those two values don't match. (int)a has a value of -1, however (int)b has a value of 255, and these two values are not equal.

According to the rules of type promotion, char (in your case signed) is promoted to int and unsigned char is also promoted to int. The ISO C 2011 standard says:

If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions.) All other types are unchanged by the integer promotions.

The integer promotions preserve value including sign. As discussed earlier, whether a ‘‘plain’’ char is treated as signed is implementation-defined.

Mecki
  • 125,244
  • 33
  • 244
  • 253
  • Same as Paul R's answer, this is an important bit of information but does not answer the question. `char` is clearly defined by the OP's implementation as `signed`. – Pascal Cuoq Jan 09 '13 at 12:54
  • @Mecki : Thanks for the reply. Your answer differs from the Lindydancer's answer in that you are saying that both values will be converted to unsigned int (so that we will compare 4294967295 against 255) while she says both values will be converted to int (so that we will compare -1 with 255). Can you please elaborate on what is correct here ? thanks... – mezda Jan 14 '13 at 18:55
  • An `unsigned char` is promoted to a plain signed `int` (§6.3.1.1), so the resulting operation will be performed in `int` type. For `==` it doesn't matter, but it will if you use another comparison operation like `<`. – Lindydancer Jan 14 '13 at 19:41
  • 1
    @user1182722: Seems like Lindydancer is correct. I first went by the type promotion as described in my C book, which claimed unsigned to be promoted to unsigned types only. As I was unsure, I looked up the promotion rules in the latest ISO-C specification and according to this spec, `int` is correct. I quoted the spec in my updated reply above. – Mecki Jan 14 '13 at 19:51
0

While there is some ambiguity around a plain "char" (see Is char signed or unsigned by default?) that's not the only thing that's going on here I think.

A literal -1 is an integer, it won't (sizeof(int)>sizeof(char), for arguments sake) "fit" into a char.The two-complement bit pattern 0xffff (32 bit int for arguments sake) is truncated and copied here.

When you call printf() the parameters are promoted to integer type, a signed type is "sign-extended", but the unsigned "b" is not, and zero padded. When you use "==" with two distinct types a similar (but not necessarily identical) type conversion is performed (aka the "usual arithmetic conversions").

See also Default argument promotions in C function calls and Signed and unsigned, and how bit extension works in C.

Community
  • 1
  • 1
mr.spuratic
  • 9,767
  • 3
  • 34
  • 24