12

I recently read that the differences between

char
unsigned char

and

signed char

is platform specific.
I can't quite get my head round this? does it mean the the bit sequence can vary from one platform to the next ie platform1 the sign is the first bit, platform2 the sign could be at the end? how would you code against this?

Basically my question comes from seeing this line:

typedef unsigned char byte;

I dont understand the relevance of the signage?

Adam Naylor
  • 6,172
  • 10
  • 49
  • 69

6 Answers6

20

Let's assume that your platform has eight-bit bytes, and suppose we have the bit pattern 10101010. To a signed char, that value is −86. For unsigned char, though, that same bit pattern represents 170. We haven't moved any bits around; it's the same bits, interpreted two different ways.

Now for char. The standard doesn't say which of those two interpretations should be correct. A char holding the bit pattern 10101010 could be either −86 or 170. It's going to be one of those two values, but you have to know the compiler and the platform before you can predict which it will be. Some compilers offer a command-line switch to control which one it will be. Some compilers have different defaults depending on what OS they're running on, so they can match the OS convention.

In most code, it really shouldn't matter. They are treated as three distinct types, for the purposes of overloading. Pointers to one of those types aren't compatible with pointers to another type. Try calling strlen with a signed char* or an unsigned char*; it won't work.

Use signed char when you want a one-byte signed numeric type, and use unsigned char when you want a one-byte unsigned numeric type. Use plain old char when you want to hold characters. That's what the programmer was thinking when writing the typedef you're asking about. The name "byte" doesn't have the connotation of holding character data, whereas the name "unsigned char" has the word "char" in its name, and that causes some people to think it's a good type for holding characters, or that it's a good idea to compare it with variables of type char.

Since you're unlikely to do general arithmetic on characters, it won't matter whether char is signed or unsigned on any of the platforms and compilers you use.

Rob Kennedy
  • 161,384
  • 21
  • 275
  • 467
18

You misunderstood something. signed char is always signed. unsigned char is always unsigned. But whether plain char is signed or unsigned is implementation specific - that means it depends on your compiler. This makes difference from int types, which all are signed (int is the same as signed int, short is the same as signed short). More interesting thing is that char, signed char and unsigned char are treated as three distinct types in terms of function overloading. It means that you can have in the same compilation unit three function overloads:

void overload(char);
void overload(signed char);
void overload(unsigned char);

For int types is contrary, you can't have

void overload(int);
void overload(signed int);

because int and signed int is the same.

Tadeusz Kopec for Ukraine
  • 12,283
  • 6
  • 56
  • 83
  • I think this clarifies things greatly but i'd like some more feedback before i accept the answer – Adam Naylor Jul 31 '09 at 11:32
  • 1
    Re 'int is the same as signed int' etc.: Unless you use it as the type of a bitfield! – Richard Corden Jul 31 '09 at 12:05
  • +1 very good answer and learnt a lot from it. Wont char take one of signed char or unsigned char on any one platform? In which case, how can the overload work? – dubnde Jul 31 '09 at 12:24
  • should have been "how does the overload work?". But just read tkopec's answer again and it is clearly mentioned they are treated as distinct types. My bad – dubnde Jul 31 '09 at 12:28
3

It's more correct to say that it's compiler-specific and you should not count on char being signed or unsigned when using char without a signed or unsigned qualifier.

Otherwise you would face the following problem: you write and debug the program assuming that char is signed by default and then it is recompiled with a compiler assuming otherwise and the program behaviour changes drastically. If you rely on this assumption only once in a while in your code you risk facing unintended behaviour in some cases which are only triggered in your program under specific conditions and are very hard to detect and debug.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
sharptooth
  • 167,383
  • 100
  • 513
  • 979
  • Here's an example of surprising behaviour: http://stackoverflow.com/questions/1097130/in-c-left-shift-char-0xff-by-8-and-cast-it-to-int/ – sharptooth Jul 31 '09 at 11:31
  • 1
    A classic problem occurs with Turkish y-umlaut (ÿ, Unicode U+00FF) in ISO 8859-1, character code 255. If char is signed, it can be confused with EOF, which is normally -1. – Jonathan Leffler Jul 31 '09 at 12:06
2

Perhaps you are referring to the fact that the signedness of char is compiler / platform specific. Here is a blog entry that sheds some light on it:

Character types in C and C++

Karl Voigtland
  • 7,637
  • 34
  • 29
0

Having a signed char is more of a fluke of how all base variable types are handled in C, generally it is not actually useful to have negative characters.

ewanm89
  • 919
  • 5
  • 22
  • Many people would say 'generally it is not useful to have unsigned chars' That is why the signedness of char differs between implementations. – William Pursell Jul 31 '09 at 11:20
  • This is what i don't understand, surely undersigned chars are more useful then signed? – Adam Naylor Jul 31 '09 at 11:23
  • And you actually ever assigned a negative value to a character, wide character support and the like is more important now than characters with negative values. – ewanm89 Jul 31 '09 at 11:25
  • @Adam, it doesn't matter when one doesn't have enough characters to fill every bit of a byte in ANSI C/ISO C++ (ASCII character set), hence the sign bit is more there for good measure. – ewanm89 Jul 31 '09 at 11:26
  • @William Pursell: I've never felt that 'char' being signed was useful for, whereas having them unsigned makes a lot of character (text) processing simpler. – Jonathan Leffler Jul 31 '09 at 12:10
  • @Jonathan Leffler: There is one situation where the signedness of an unadorned'char' is required: when 'char' and 'int' are the same size. Not all machines running C have 8-bit bytes; I've written C code for a platform with 16-bit bytes and 16-bit integers. The standard requires that a signed 'int' be able to hold all values that can be held by a 'char'. If 'char' were unsigned 16-bit and 'int' were signed 16-bit, that requirement would be violated. – supercat Dec 19 '10 at 02:34
  • @supercat: since EOF must be a negative int and should be distinct from all valid characters, and since functions like fgetc() return the character as an unsigned char converted to int, I think your environment with 16-bit char and 16-bit integer presents some unusual problems at the edge of the C language definition. You might get away with it in Unicode: U+FFFF is not a valid Unicode character. But it gets tricky. It is interesting to know there are such platforms - can you identify it? Is it an embedded computer system? – Jonathan Leffler Dec 19 '10 at 03:48
  • @Jonathan Leffler: I've written a TCP stack on "bare metal" on a TI DSP, where sizeof(int)==sizeof(char)==16 bits. I didn't use any library I/O routines; my TCP routines use the bottom 8 bits of each byte, so -1 works fine as an EOF indicator. Some other routines like my flash file system use 16-bit bytes, but each file has a read-EOF flag. – supercat Dec 20 '10 at 01:35
  • @Jonathan Leffler: Incidentally, it's worth noting that even in machines with 8-bit bytes, not all byte values are necessarily legal in text files. On many MS-DOS implementations, for example, a character 0x1A in a text file will be regarded as EOF, and anything past that will be ignored. So it would be perfectly plausible for a system with a signed 16-bit 'char' type to disallow -1 within a text file. – supercat Dec 20 '10 at 16:37
-6

a signed char is always 8 bit and has always the signed bit as the last bit.

an unsigned char is always 8 bit and doesn't have a sign bit.

a char is as far as I know always unsigned. Any compiler defaulting to a signed char will face a lot of incompatible programs.

Toad
  • 15,593
  • 16
  • 82
  • 128
  • 2
    A char is not always 8 bits. Historically, it was often 9. Currently, it is often 16 or 32. The number of bits in a char is CHAR_BIT, which is implementation dependent. – William Pursell Jul 31 '09 at 11:26
  • 1
    I dont' beliece this is correct... http://www.parashift.com/c++-faq-lite/intrinsic-types.html#faq-26.4 Clearly states that char == 1 byte and 1 byte == AT LEAST 8 bit's? – Adam Naylor Jul 31 '09 at 11:28
  • gcc, msvc in default recognizes char as signed char. – nothrow Jul 31 '09 at 11:41
  • Yossarian, for GCC, the default depends on what platform it's running on. – Rob Kennedy Jul 31 '09 at 17:30