3

Why are '\x90' and 0x90 different from each other. I understand that one is hexadecimal escape sequence and other is hexadecimal number . However if I convert them to decimal I get 144 , which should be the value for both '\x90' and 0x90 . Also, book says that '\x90' is negative value whereas 0x90 is positive.

To my knowledge char is only 1 byte and int is 4 , so we would get

char '\x90' = 1001 0000 ( 1 byte,8 bits)
int 0x90 = 1001 0000 0000 0000 0000 0000 0000 0000 (4 byte,32 bits)

Still I fail to understand why the char x90 is negative and leads to difference value than int 0x90.

My question is not about char signed and unsigned , although that relates to my question, I am asking about the into values of those characters..

holahola
  • 59
  • 3
  • 9
  • 2
    It appears that char is a signed type in your compiler (a common option). Because char is only 8 bits, it can only represent positive 0..127, and -1..-128. 0x90 is 144, so it can't fit into 0..127. It "overflows" and ends up representing -112. But that's purely interpretation--the bits are all the same in any case. – Lee Daniel Crocker Mar 07 '18 at 22:41
  • Could you explain the process how it ends up being -112 ? – holahola Mar 07 '18 at 22:44
  • 3
    Google "twos complement" – Lee Daniel Crocker Mar 07 '18 at 22:44
  • If I calculate twos complement in binary wouldn't it be the same for both values ? Since the process is converting them to binary , flipping the 0s and 1s for one's complement then adding 1 to get twos complement. – holahola Mar 07 '18 at 22:48
  • @holahola Do you know why `a` and `b` returns `-1` [HERE](https://ideone.com/lGxkDp) and `c` and `d` they do not? – Michi Mar 07 '18 at 22:49
  • my guess is since unsigned is never negative ,it doesnot return -1 – holahola Mar 07 '18 at 22:54
  • @holahola Do not stop there. If int is 4 bytes then you have `0000 0000 0000 0000` . Why you get `-1` and not `1111 1111 1111 1111` decimal `65635` ? I just inverted all bits. That is your main problem here also. – Michi Mar 07 '18 at 22:59
  • 3
    Possible duplicate of [Is char signed or unsigned by default?](https://stackoverflow.com/questions/2054939/is-char-signed-or-unsigned-by-default) – phuclv Mar 08 '18 at 01:14
  • 0x90 - 256 = -112, simple as that. That's how two's complement works – phuclv Mar 08 '18 at 01:18

5 Answers5

2

In C '\x90' and 0x90 are both int constant literals, but they may have a different value if the char type is signed and has 8 bits. In this case, '\x90' has a value of -112 whereas 0x90 is always 144.

The C Standard specifies this:

6.4.4.4 Character constants.

§10 An integer character constant has type int. The value of an integer character constant containing a single character that maps to a single-byte execution character is the numerical value of the representation of the mapped character interpreted as an integer. The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined. If an integer character constant contains a single character or escape sequence, its value is the one that results when an object with type char whose value is that of the single character or escape sequence is converted to type int.

Hence the character constant '\x90' has a value of (int)(char)0x90 which is 144 if the char type is signed by default or is wider than 8 bits. Otherwise its value is -112 as seems to be the case on your system.

Community
  • 1
  • 1
chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • Thank you , but I really do not get the part how '\x90' has a value of -112 . how do I find twos complement of '\x90' ? Isn't it same as twos complement of 0x90? – holahola Mar 07 '18 at 23:04
  • the `char` type can be signed or unsigned depending on the compiler default settings. If it is signed, character constants with a non zero bit value for the 8th bit will be negative. – chqrlie Mar 07 '18 at 23:11
2

char is 1 byte = 8 bits. If we consider it to be "unsigned" (only positive numbers) then 0x90 = 144, which is no problem to hold.

But char is not unsigned. Meaning that one bit is reserved to indicate positive or negative (the sign bit). Therefore only 7 bits are used to represent the maximum positive number. 2^7 = 128. When you try to assign 0x90 to char, it is therefore larger than the largest positive value. This is signed overflow and undefined behavior.

Most implementations will just wrap around to the negatives, so it instead becomes -128 - (128-144) = -128 + 16 = -112.

The bits may be the same, but the interpretation is not.

(Disclaimer: The actual largest positive value you can hold in 7 bits is 127, and I said what I said because it makes the most intuitive sense. 0 is one of the values that must be accounted for, so the real formula is 2^N-1 where N is the number of bits. Consider 1 bit; the maximum value is 1 even though 2^1 = 2)

AndyG
  • 39,700
  • 8
  • 109
  • 143
  • 2
    to be precise, [`char` can have more than 8 bits](https://stackoverflow.com/q/2098149/995714), and [it can be signed or unsigned](https://stackoverflow.com/q/2054939/995714) – phuclv Mar 08 '18 at 01:15
2

Why are '\x90' and 0x90 different from each other(?)

The first is an escape sequence and the second is an integer constant. They have the same value and type.


I fail to understand why the char x90 is negative and leads to difference value than int 0x90.

They both have the same value when assigned to a char.


'\x90', 0x90 and 144 are all integer constants in C. All 3 have the same type, int and same value: 144.

A char will either act like a signed char or unsigned char. Apparently in OP's case, it acts like a signed char with a range of [-128 ... 127].


Consider char ch = 144;

Assigning 144, which is out of range of OP's char results in implementation defined behavior. This means the implementation can do all sorts of things like assign the maximum value as if ch = 127;. The most common implementation defined behavior is to repeatedly add/subtract 256 until the sum is in range. This is 144-256 --> -112.

When looking at 144 as an 8-bit unsigned char and -112 as 8-bit signed char, they both have the same bit pattern 1001 0000.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • *All 3 have the ... same value: 144.* not true if `char` is signed by default. – chqrlie Mar 08 '18 at 00:41
  • 1
    @chqrlie `'\x90', 0x90, 144` are all integers of type `int` with the same value. `char` is not involved at all at this point. Sign-ness of `char` is irrelevant in determining the type/value of these constants – chux - Reinstate Monica Mar 08 '18 at 00:42
  • `char` should not be involved, but it is... all three indeed have type `int` in C, but not the same value if `char` is signed and 8-bit wide. – chqrlie Mar 08 '18 at 00:44
  • @chqrlie The determination of a _integer constant_ value does not involve `char` in any way. C does not have `int` constant literals. Perhaps you are thinking of another language? – chux - Reinstate Monica Mar 08 '18 at 00:45
  • As counterintuitive as it may be, **It does**: C11 6.4.4.4 §10: *An integer character constant has type int. The value of an integer character constant containing a single character that maps to a single-byte execution character is the numerical value of the representation of the mapped character interpreted as an integer. ... If an integer character constant contains a single character or escape sequence, its value is the one that results when an object with type `char` whose value is that of the single character or escape sequence is converted to type `int`.* – chqrlie Mar 08 '18 at 01:00
  • @chqrlie Agree that `char` does affect an integer character constant contains a single character or escape sequence as you have well cited. – chux - Reinstate Monica Mar 08 '18 at 01:56
  • Yes , the conclusion is that until 128 the hex escape sequence and int hex are same , and after that char overflows as its beyond the range as in this case (144) – holahola Mar 08 '18 at 20:22
  • Where as in Java the case might be completely difference , as char is 2 bytes – holahola Mar 08 '18 at 20:23
1

Both represent the same value. The difference is in where they are used.

\x90 is a character constant and has type char. This sequence is needed inside of either single quotes or double quotes. 0x90 is a hexadecimal integer constant of type int, and it is not used within quotes.

As for positive / negative, integer constants have type int unless they have a suffix denoting the type. Since 0x90 fits inside the range of an int, it has a positive value. If you assigned it to a variable of type char, the value lies outside the range of char and is converted in an implementation defined manner.

Similarly, the escape sequence \x90 has type unsigned char. If used within a character constant such as '\x90' it is converted to char, however the value is outside the range of char so it is again converted.

For example:

int a = 0x90;           // valid, has value 144
int b = '\x90';         // valid, has value -114
char c = 0x90;          // invalid, value out of range
char d = '\x90';        // invalid, same as above
unsigned char e[] = "\x90\x90";  // valid, string containing two bytes
char f[] = "\x90\x90";  // invalid, string containing two bytes but values are out of range
char g = \x90;          // invalid, compile error
char h = "0x90"         // valid, but contains the characters '0', 'x', '9', '0'
dbush
  • 205,898
  • 23
  • 218
  • 273
1

Without seeing your code here is one possibility:

char c = '\x90' // 1001 0000 in binary
int i  = 0x90   // 1001 0000 in binary

if you do something like this

i = (int) c;    // i is ffffff90
                // casting is not necessary in C but this is just for this example

because sign (the most left bit in int and char) carries over to fill space to the left.

EDIT: So char is 8bit wide int is 32 bits wide. so when you transfer char int the int most right bit copies over so char c is 1001 0000 (0x90) when you copy it over to int, by convention value is 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1001 0000 (0xffffffffffffff90) because that bold 1 is copied to the left thus getting negative value.

By the rule int or char with most left bit set to 1 is negative, thus in char c = 0x90 `c' is negative

  • casting c to int, when coverting `char` to `int` you do not have to do it but it is done. –  Mar 07 '18 at 23:06
  • 1
    `i = int (c)` is a syntax error in C. The convention for the position of the sign bit is the *leftmost*, not the *rightmost*. – chqrlie Mar 07 '18 at 23:06
  • Sorry will fix mistake, I was thinking in c++ –  Mar 07 '18 at 23:06
  • So by casting it becomes ffffff90 , but what makes that negative number ? Shouldn't the leftmost bit be 1 for a binary number to become negative. – holahola Mar 07 '18 at 23:10
  • @holahola If [This still does not help](https://ideone.com/9SgEwz), then nothing helps. – Michi Mar 07 '18 at 23:11
  • “By the rule any int or char with most left bit set to 1 is negative.” You mean signed int/char or You mean that This aplly to unsigned too? – Michi Mar 07 '18 at 23:22
  • @Michi yes, I mean signed lets not confuse the hell out'a him right now since he does not know how the negative number is represented in memory. –  Mar 07 '18 at 23:24
  • I am not making (trying to make) a confusion here,but the OP can if you understand my Point of view. – Michi Mar 07 '18 at 23:27
  • @Michi actually I am trying not to confuse the hell out of him. :) it is not that you are confusing him. –  Mar 07 '18 at 23:29
  • My point was that this “By the rule any int or char with most left bit set to 1 is negative.” should be changed to this “By the rule any signed int or signed char with most left bit set to 1 is negative.” ... or maybe I am wrong. – Michi Mar 07 '18 at 23:30
  • 1
    @Michi I fixed it in my answer. Thanks for pointing that out. –  Mar 07 '18 at 23:30
  • You are right GRC, it actually does copy all the 1s to left while casting .. that clears my week long confusion ! – holahola Mar 08 '18 at 20:21