Simple Character Interpretation In C

Question

Here is my code

 #include<stdio.h>

 void main()
 {
     char ch = 129;
     printf("%d", ch);
 }

I get the output as -127. What does it mean?

Here is the part which I do not understand : 129 is a code which points to ü: Seconded : How does the Compiler Store the number 129 in "char"'s memory - if char can itself hold till 128 values!! to store or hit the character ü it has to first see 129 : it does not have memory to store 129 : : how does this happen ? — , Feb 03 '12 at 08:10
http://stackoverflow.com/questions/4240748/allowing-signed-integer-overflows-in-c-c — jmq, Feb 03 '12 at 08:13
I edited my comment to explain in detail the process by which the processor (not the compiler) store that shiny 129. (With ascii art!!) — whtlnv, Feb 03 '12 at 10:22

whtlnv · Answer 1 · 2012-02-03T21:41:35.607

It means that char is an 8-bit variable that can only hold 2^8 = 256 values, since the declaration is char ch, ch is a signed variable, which means it can store 127 negative and positive values. when you ask to go over 127 then the value starts over from -128.

Think of it like some arcade games where you go from one side of the screen to the other:

ch = 50;

                                    ----->                        50 is stored
      |___________________________________|___________|           since it fits
    -128                       0         50          127          between -127
                                                                  and 128

ch = 129;

                                                    ---           129 goes over
      -->                                                         127 by 2, so
      |__|____________________________________________|           it 'lands' in
    -128  -127                 0                     127          -127

BUT!! you shouldn't rely on this since it's undefined behaviour!

In honor of Luchian Grigore here's the bit representation of what's happening:

A char is a variable that will hold 8-bits or a byte. So we have 8 0's and 1's struggling to represent whatever value you desire. If the char is a signed variable it will represent whether it's a positive or negative number. You probably read about the one bit representing the sign, that's an abstraction of the true process; in fact it is only one of the first solutions implemented in electronics. But such a trivial method had a problem, you would have 2 ways of representing 0 (+0 and -0):

0 0000000     ->    +0        1 0000000     ->    -0                    
^                             ^ 
|_ sign bit 0: positive       |_ sign bit 1: negative

Inconsistencies guaranteed!! So, some very smart folks came up with a system called Ones' Complement which would represent a negative number as the negation (NOT operation) of its positive counterpart:

01010101      ->    +85
10101010      ->    -85

This system... had the same problem. 0 could be represented as 00000000 (+0) and 11111111 (-0). Then came some smarter folks who created Two's Complement, which would hold the negation part of the earlier method and then add 1, therefore removing that pesky -0 and giving us a shiny new number to our range: -128!. So how does our range look now?

00000000     +0
00000001     +1
00000010     +2
...
01111110     +126
01111111     +127
10000000     -128
10000001     -127
10000010     -126
...
11111110     -2
11111111     -1

So, this should give an idea of what's happening when our little processor tries to add numbers to our variable:

 0110010     50                   01111111     127
+0000010    + 2                  +00000010    +  2
 -------     --                   --------     ---
 0110100     52                   10000001    -127
     ^                                  ^       ^
     |_ 1 + 1 = 10          129 in bin _|       |_ wait, what?!

Yep, if you review the range table above you can see that up to 127 (01111111) the binary was fine and dandy, nothing weird happening, but after the 8'th bit is set at -128 (10000000) the number interpreted no longer held to its binary magnitude but to the Two's Complement representation. This means, the binary representation, the bits in your variable, the 1's and 0's, the heart of our beloved char, does hold a 129... its there, look at it! But the evil processor reads that as measly -127 cause the variable HAD to be signed undermining all its positive potential for a smelly shift through the real number line in the Euclidean space of dimension one.

It's **undefined behavior**. This may apply for a run, but it's not a rule. Please don't state things that aren't true. — Luchian Grigore, Feb 03 '12 at 08:15
@LuchianGrigore: The question was very clear "I get the output as -127 what does It mean?" I answered why he got -127. I did not said it was a good practice or that he should implement at every chance he got. — whtlnv, Feb 03 '12 at 08:19
You didn't. You just assumed this is what happens. You don't know his platform, or his compiler. Anything can happen. That's what UB means. The rest is just guessing. — Luchian Grigore, Feb 03 '12 at 08:23
Plus, `129 goes over 127 by 2, so it lands in -127` is a very **very** bad explanation. You're not saying anything about encoding, bit representation or anything (which might be better, although still UB). — Luchian Grigore, Feb 03 '12 at 08:24

Luchian Grigore · Answer 2 · 2012-02-03T08:52:08.697

2

It means you ran into undefined behavior.

Any outcome is possible.

char ch=129; is UB because 129 is not a representable value for a char for you specific setup.

edited Feb 03 '12 at 08:52

answered Feb 03 '12 at 08:05

Luchian Grigore

253,575
64
457
625

It's not undefined because 129 is not representable as a char. It's undefined because 129 CAN'T be stored in 7 bits – whtlnv Feb 03 '12 at 08:23
3

@whitelionV the standard says nothing of bits. It says that attempting to store anything that is outside the range representable by the type is undefined behavior, which is what I said. Which is true. – Luchian Grigore Feb 03 '12 at 08:29
The standard DOES say that a char will hold at least 8-bits, and by CONVENTION (not by implementation) you should use it to hold characters. If any processor will compile an unsigned char as a 16-bit variable IT WILL print 129 and all the way to 32,767 – whtlnv Feb 03 '12 at 08:46
1

In *this particular case* 129 isn't representable in `char` because apparently here `char` is an 8-bit signed type. However, if `char` is bigger than 8 bits or is unsigned (both of which are allowed per the C standard), 129 can be represented in `char`. So, it would be correct to say that 129 is conditionally representable in `char`. – Alexey Frunze Feb 03 '12 at 08:49

score 1 · Answer 3 · edited May 23 '17 at 11:44

1

On your system: char 129 has the same bits as the 8 bit signed integer -127. An unsigned integer goes from 0 to 255, and signed integer -128 to 127.

Related (C++):

You may also be interested in reading the nice top answer to What is an unsigned char?

As @jmquigley points out. This is strictly undefined behavior and you should not rely on it. Allowing signed integer overflows in C/C++

edited May 23 '17 at 11:44

Community

1
1

answered Feb 03 '12 at 07:58

Johan Lundberg

26,184
12
71
97

2

I think you mean that an signed character goes from -128 to +127? – Some programmer dude Feb 03 '12 at 08:00
It's not guaranteed to be the same bits. It's undefined behavior. – jmq Feb 03 '12 at 08:13
I'm not sure that said top answer in that link applies to C. Correct me if I'm wrong, but I think character literals in C are of type int, but of type char in C++? – Lundin Feb 03 '12 at 09:40

score 1 · Answer 4 · answered Feb 03 '12 at 08:01

1

Your char is most likely an 8-bit signed integer that is stored using Two's complement. Such a variable can only represent numbers between -128 and 127. If you do "127+1" it wraps around to -128. So 129 is equivalent to -127.

answered Feb 03 '12 at 08:01

David Grayson

84,103
24
152
189

score 1 · Answer 5 · answered Feb 03 '12 at 08:49

This comes from the fact that a char is coded on one byte, so 8 bits of data.

In fact char has a value coded on 7 bits and have one bit for the sign, unsigned char have 8 bits of data for its value.

This means:

Taking abcdefgh as 8 bits respectively (a being the leftmost bit, and h the rightmost), the value is encoded with a for the sign and bcdefgh in binary format for the real value:

42(decimal) = 101010(binary) stored as : abcdefgh 00101010

When using this value from the memory : a is 0 : the number is positive, bcdefgh = 0101010 : the value is 42

What happens when you put 129 :

129(decimal) = 10000001(binary) stored as : abcdefgh 10000001

When using this value from the memory : a is 0 : the number is negative, we should substract one and invert all bits in the value, so (bcdefgh - 1) inverted = 1111111 : the value is 127 The number is -127

score 0 · Answer 6 · answered Feb 03 '12 at 07:59

0

The char type is a 8-bit signed integer. If you interpret the representation of unsigned byte 129 in the two's complement signed representation, you get -127.

answered Feb 03 '12 at 07:59

Lukáš Lalinský

40,587
6
104
126

1

char isn't necessarily signed, whether it is signed or unsigned is implementation-defined behavior. In this particular implementation however, it seems to be signed. – Lundin Feb 03 '12 at 08:30

score 0 · Answer 7 · answered Feb 03 '12 at 07:59

0

The type char can be either signed or unsigned, it's up to the compiler. Most compilers have it as `signed.

In your case, the compiler silently converts the integer 129 to its signed variant, and puts it in an 8-bit field, which yields -127.

answered Feb 03 '12 at 07:59

Some programmer dude

400,186
35
402
621

score 0 · Answer 8 · answered Feb 03 '12 at 08:00

0

char is 8 bits, signed. It can only hold values -128 to 127. When you try and assign 129 to it you get the result you see because the bit that indicates signing is flipped. Another way to think of it is that the number "wraps" around.

answered Feb 03 '12 at 08:00

Brian Roach

76,169
12
136
161

score 0 · Answer 9 · edited Jun 20 '20 at 09:12

Whether a plain char is signed or unsigned, is implementation-defined behavior. This is a quite stupid, obscure rule in the C language. int, long etc are guaranteed to be signed, but char could be signed or unsigned, it is up to the compiler implementation.

On your particular compiler, char is apparently signed. This means, assuming that your system uses two's complement, that it can hold values of -128 to 127.

You attempt to store the value 129 in such a variable. This leads to undefined behavior, because you get an integer overflow. Strictly speaking, anything can happen when you do this. The program could print "hello world" or start shooting innocent bystanders, and still conform to ISO C. In practice, most (all?) compilers will however implement this undefined behavior as "wrap around", as described in other answers.

To sum it up, your code relies on two different behaviors that aren't well defined by the standard. Understanding how the result of such unpredictable code ends up in a certain way has limited value. The important thing here is to recognize that the code is obscure, and learn how to write it in a way that isn't obscure.

The code could for example be rewritten as:

unsigned char ch = 129;

Or even better:

#include <stdint.h>
...
uint8_t ch = 129;

As a rule of thumb, make sure to follow these rules in MISRA-C:2004:

6.1 The plain char type shall be used only for the storage and use of character values.

6.2 signed and unsigned char type shall be used only for the storage and use of numeric values.

Simple Character Interpretation In C

9 Answers9

BUT!! you shouldn't rely on this since it's undefined behaviour!

Linked

Related