What does it mean for a char to be signed?

Question

Given that signed and unsigned ints use the same registers, etc., and just interpret bit patterns differently, and C chars are basically just 8-bit ints, what's the difference between signed and unsigned chars in C? I understand that the signedness of char is implementation defined, and I simply can't understand how it could ever make a difference, at least when char is used to hold strings instead of to do math.

the answer is simple, your assumption that chars aren't used for math is wrong. I frequently use "uint8_t" and "int8_t" in system code which are often defined as unsigned and signed char respectivly. — Evan Teran, Jan 16 '09 at 18:45
I think part of this is me forgetting that there's no distinct byte/unsigned byte type in C. — dsimcha, Jan 16 '09 at 19:16
possible duplicate of [Difference between signed / unsigned char](http://stackoverflow.com/questions/4337217/difference-between-signed-unsigned-char) — Ciro Santilli OurBigBook.com, Jun 02 '15 at 18:37

Nick Fortescue · Accepted Answer · 2009-12-17T12:58:52.793

37

It won't make a difference for strings. But in C you can use a char to do math, when it will make a difference.

In fact, when working in constrained memory environments, like embedded 8 bit applications a char will often be used to do math, and then it makes a big difference. This is because there is no byte type by default in C.

edited Dec 17 '09 at 12:58

answered Jan 16 '09 at 18:06

Nick Fortescue

43,045
26
106
134

5

He's saying there's no type called `byte`, not that no type is a byte. – Eagle-Eye Mar 19 '13 at 17:21
There is the `uint8_t` type in C. Unless the system is completely obscure, bytes are 8 bits. – Lundin Apr 15 '15 at 11:07
2

Does this explain why even using signed char? when all characters can be represented in unsigned char? (I presume the negative values don't encodes to anything). Imo unsigned char should be the default type instead of signed char. – einstein Jan 10 '22 at 10:55

Ates Goral · Answer 2 · 2016-05-24T13:05:29.187

32

In terms of the values they represent:

unsigned char:

spans the value range 0..255 (00000000..11111111)
values overflow around low edge as:

0 - 1 = 255 (00000000 - 00000001 = 11111111)
values overflow around high edge as:

255 + 1 = 0 (11111111 + 00000001 = 00000000)
bitwise right shift operator (>>) does a logical shift:

10000000 >> 1 = 01000000 (128 / 2 = 64)

signed char:

spans the value range -128..127 (10000000..01111111)
values overflow around low edge as:

-128 - 1 = 127 (10000000 - 00000001 = 01111111)
values overflow around high edge as:

127 + 1 = -128 (01111111 + 00000001 = 10000000)
bitwise right shift operator (>>) does an arithmetic shift:

10000000 >> 1 = 11000000 (-128 / 2 = -64)

I included the binary representations to show that the value wrapping behaviour is pure, consistent binary arithmetic and has nothing to do with a char being signed/unsigned (expect for right shifts).

Update

Some implementation-specific behaviour mentioned in the comments:

char != signed char. The type "char" without "signed" or "unsinged" is implementation-defined which means that it can act like a signed or unsigned type.
Signed integer overflow leads to undefined behavior where a program can do anything, including dumping core or overrunning a buffer.

edited May 24 '16 at 13:05

answered Jan 16 '09 at 19:22

Ates Goral

137,716
26
137
190

1

Hmmm ... isn't the overflow behavior of signed types implementation defined? – Martin Ba Feb 15 '13 at 09:43
@MartinBa I don't know. Do you know of any instances where it is different or are you merely asking? My intuition tells me that the behaviour should be consistent because I wouldn't imagine a C implementation doing anything beyond what the underlying CPU does for some ADD machine instruction -- it's the same bitwise addition of bits, as far as my limited knowledge of CPUs serves me. – Ates Goral Feb 15 '13 at 15:56
4

[In contrast, the C standard says that signed integer overflow leads to undefined behavior where a program can do anything](http://www.gnu.org/software/autoconf/manual/autoconf-2.67/html_node/Integer-Overflow-Basics.html#Integer-Overflow-Basics) For historical reasons the C standard also allows implementations with ones' complement or signed magnitude arithmetic – Martin Ba Feb 15 '13 at 16:10
@MartinBa Thanks for that info! I wonder if any modern (relevant) implementation do anything beyond regular CPU-level arithmetic (i.e. the overflow behaviour is defined and consistent across implementations)... – Ates Goral Feb 15 '13 at 17:35
1

@Ates: Yes: e.g. if your loop index is a signed integer type, optimizers will generate more efficient loop code because they don't have to worry about doing the expected thing in the face of overflow. – Feb 18 '15 at 05:45
3

@Altes Goral: I like your answer but I think you should mention that char != signed char. The type "char" without "signed" or "unsinged" is implementation-defined which means that it can act like a signed or unsigned type. – FrozenTarzan May 03 '16 at 07:18
Nice. Seeing how that high-order bit is handled differently between the two types when the value is overflowed did the trick for me! – Tom Russell Aug 18 '22 at 21:58

score 11 · Answer 3 · edited Jan 16 '09 at 19:02

11

#include <stdio.h>

int main(int argc, char** argv)
{
    char a = 'A';
    char b = 0xFF;
    signed char sa = 'A';
    signed char sb = 0xFF;
    unsigned char ua = 'A';
    unsigned char ub = 0xFF;
    printf("a > b: %s\n", a > b ? "true" : "false");
    printf("sa > sb: %s\n", sa > sb ? "true" : "false");
    printf("ua > ub: %s\n", ua > ub ? "true" : "false");
    return 0;
}


[root]# ./a.out
a > b: true
sa > sb: true
ua > ub: false

It's important when sorting strings.

edited Jan 16 '09 at 19:02

Adam Davis

91,931
60
264
330

answered Jan 16 '09 at 18:05

Quassnoi

413,100
91
616
614

2

"By default, char is signed" As the OP said - that is implementation defined. – Steve Fallows Jan 16 '09 at 18:13

Johannes Schaub - litb · Answer 4 · 2009-01-16T18:50:17.443

There are a couple of difference. Most importantly, if you overflow the valid range of a char by assigning it a too big or small integer, and char is signed, the resulting value is implementation defined or even some signal (in C) could be risen, as for all signed types. Contrast that to the case when you assign something too big or small to an unsigned char: the value wraps around, you will get precisely defined semantics. For example, assigning a -1 to an unsigned char, you will get an UCHAR_MAX. So whenever you have a byte as in a number from 0 to 2^CHAR_BIT, you should really use unsigned char to store it.

The sign also makes a difference when passing to vararg functions:

char c = getSomeCharacter(); // returns 0..255
printf("%d\n", c);

Assume the value assigned to c would be too big for char to represent, and the machine uses two's complement. Many implementation behave for the case that you assign a too big value to the char, in that the bit-pattern won't change. If an int will be able to represent all values of char (which it is for most implementations), then the char is being promoted to int before passing to printf. So, the value of what is passed would be negative. Promoting to int would retain that sign. So you will get a negative result. However, if char is unsigned, then the value is unsigned, and promoting to an int will yield a positive int. You can use unsigned char, then you will get precisely defined behavior for both the assignment to the variable, and passing to printf which will then print something positive.

Note that a char, unsigned and signed char all are at least 8 bits wide. There is no requirement that char is exactly 8 bits wide. However, for most systems that's true, but for some, you will find they use 32bit chars. A byte in C and C++ is defined to have the size of char, so a byte in C also is not always exactly 8 bits.

Another difference is, that in C, a unsigned char must have no padding bits. That is, if you find CHAR_BIT is 8, then an unsigned char's values must range from 0 .. 2^CHAR_BIT-1. THe same is true for char if it's unsigned. For signed char, you can't assume anything about the range of values, even if you know how your compiler implements the sign stuff (two's complement or the other options), there may be unused padding bits in it. In C++, there are no padding bits for all three character types.

score 2 · Answer 5 · answered Jan 16 '09 at 19:22

"What does it mean for a char to be signed?"

Traditionally, the ASCII character set consists of 7-bit character encodings. (As opposed to the 8 bit EBCIDIC.)

When the C language was designed and implemented this was a significant issue. (For various reasons like data transmission over serial modem devices.) The extra bit has uses like parity.

A "signed character" happens to be perfect for this representation.

Binary data, OTOH, is simply taking the value of each 8-bit "chunk" of data, thus no sign is needed.

cHao · Answer 6 · 2013-04-16T18:27:44.860

Signedness works pretty much the same way in chars as it does in other integral types. As you've noted, chars are really just one-byte integers. (Not necessarily 8-bit, though! There's a difference; a byte might be bigger than 8 bits on some platforms, and chars are rather tied to bytes due to the definitions of char and sizeof(char). The CHAR_BIT macro, defined in <limits.h> or C++'s <climits>, will tell you how many bits are in a char.).

As for why you'd want a character with a sign: in C and C++, there is no standard type called byte. To the compiler, chars are bytes and vice versa, and it doesn't distinguish between them. Sometimes, though, you want to -- sometimes you want that char to be a one-byte number, and in those cases (particularly how small a range a byte can have), you also typically care whether the number is signed or not. I've personally used signedness (or unsignedness) to say that a certain char is a (numeric) "byte" rather than a character, and that it's going to be used numerically. Without a specified signedness, that char really is a character, and is intended to be used as text.

I used to do that, rather. Now the newer versions of C and C++ have (u?)int_least8_t (currently typedef'd in <stdint.h> or <cstdint>), which are more explicitly numeric (though they'll typically just be typedefs for signed and unsigned char types anyway).

score 1 · Answer 7 · answered Jan 16 '09 at 19:51

Arithmetic on bytes is important for computer graphics (where 8-bit values are often used to store colors). Aside from that, I can think of two main cases where char sign matters:

converting to a larger int
comparison functions

The nasty thing is, these won't bite you if all your string data is 7-bit. However, it promises to be an unending source of obscure bugs if you're trying to make your C/C++ program 8-bit clean.

score 0 · Answer 8 · answered Jan 16 '09 at 18:05

The only situation I can imagine this being an issue is if you choose to do math on chars. It's perfectly legal to write the following code.

char a = (char)42;
char b = (char)120;
char c = a + b;

Depending on the signedness of the char, c could be one of two values. If char's are unsigned then c will be (char)162. If they are signed then it will an overflow case as the max value for a signed char is 128. I'm guessing most implementations would just return (char)-32.

score 0 · Answer 9 · answered Jan 16 '09 at 18:39

0

One thing about signed chars is that you can test c >= ' ' (space) and be sure it's a normal printable ascii char. Of course, it's not portable, so not very useful.

answered Jan 16 '09 at 18:39

1

The standard library contains a function called isprint, which checks whether a character is printable, so this trick isn't useful at all. – Nate879 Jan 16 '09 at 19:09
But ASCII code 127 is "delete", not a visible graphical character, and your proposed check will merrily accept that. It's not even portable in the sense of being reliably interpreted by terminals. – underscore_d May 11 '17 at 15:44

What does it mean for a char to be signed?

9 Answers9

unsigned char:

signed char:

Linked