4

I see lots of answers about signed/unsigned char but not this exact question so feel free to close/duplicate if there is already an answer.

I'm aware that in C and C++ the data type 'char' can be either signed or unsigned. I know that different platforms choose differently, however x86, and all the other platforms I've personally used chose 'char' to be signed.

It seems to me that there are some minor advantages to choosing an unsigned char, for example you can use the value as an array index if you wish to in order to categorize values, but presumably there are reasons, either language related, or in the target architecture that make signed a better choice.

What are those reasons?

jcoder
  • 29,554
  • 19
  • 87
  • 130
  • 1
    Consistency? `int` = `signed int`, so consistency requires `char` = `signed char` (just a guess) – anatolyg May 10 '16 at 15:45
  • 2
    related: http://stackoverflow.com/questions/15533115/why-dont-the-c-or-c-standards-explicitly-define-char-as-signed-or-unsigned – NathanOliver May 10 '16 at 15:46
  • One reason I ask it that I'm playing with making a compiler for "fun" and I'm finding it hard to see why people chose "signed".... Although the advantages of unsigned are minor they seem to exists. I'm sure there must be a reason. – jcoder May 10 '16 at 15:47
  • If your processor architecture only has sign-extending loads for memory locations of less than register size, keeping `char` signed improves performance. I'm not aware of any architecture that does this though. – EOF May 10 '16 at 15:48
  • If you have a smartphone, at least one platform you use **does** have `char unsigned. – too honest for this site May 10 '16 at 15:48
  • 2
    @anatolyg `char` and `signed char` are different types. – Barry May 10 '16 at 15:49
  • Ah @NathanOliver the link you posted gives a good reason in the comment. (Comparing variables stored in char with int constants)... Ok, happy to close this as a duplicate :P – jcoder May 10 '16 at 15:49
  • @EOF: The other way 'round: ARM had an unsigned byte-load only in early revisions, thus the preference for unsigned `char`. That changed with ARMv4 (ARM7). – too honest for this site May 10 '16 at 15:50
  • @jcoder I would not close as a duplicate(which is why I didn't). I just wanted to point you to a post you may not have seen. – NathanOliver May 10 '16 at 15:51
  • @NathanOliver: I very well think this is a dup. May be asked a bit different, but the other question is clearly broader, as it asks for both variants. – too honest for this site May 10 '16 at 15:54
  • Types `short`, `int` and `long` are all signed by default. `char` is another integer type and (often) follows the same pattern. Just as with other types, if you want the expanded positive range, make it `unsigned.` – Weather Vane May 10 '16 at 15:56
  • @WeatherVane: Not on the mostly used 32/64 bit platform: ARM. And on some MCUs, like MSP430 also not, because they have no sign-extend load or such instruction (consider an 8 bit CPU). – too honest for this site May 10 '16 at 15:58

1 Answers1

14

The signed keyword was added in C89. Prior to that point, if you made char and unsigned char the same, there was no way to access a signed char-sized type. Therefore, most early C ABIs defined char to be signed. (Even then, though, there were exceptions — C89 would have mandated that char be signed if there hadn't been any exceptions.)

Since that time, we have had a continuous feedback loop between code assuming that char is signed (because the programmers have never seen an ABI where it isn't, so why bother typing an extra word?) and ABIs defining char as signed to ensure compatibility with as much existing code as possible.

A greenfields language design would make char and int8_t separate fundamental types, but C's importance these days rests on the huge body of existing code; you're not likely to see this change ever.

(Also keep in mind that in 1989 it was still quite common for computers and applications to support only 7-bit ASCII. Thus, the inconveniences of signed char for textual data were much less obvious. Those look-up tables you mention would only have had 128 entries. Having char be 8-bit signed is actually more convenient for programs that work with 7-bit text and use the eighth bit as a per-character flag.)

zwol
  • 135,547
  • 38
  • 252
  • 361
  • This also left room for optimisations. E.g. some MCUs did not and do not (e.g. MSP430) have signed or unsigned byte-load instructions. Anyway, I think this is a duplicate of another question. – too honest for this site May 10 '16 at 15:56
  • Yes, totally agree with the pangs of `int8_t` being same fundamental type as `char`. I can't even express the frustration when I forget about this (as I do!), and print it in some diagnostic (expecting numeric output), only to find some unprintable random thing in my log. Now I have to redo the whole testcase after making sure I cast! – SergeyA May 10 '16 at 16:04