significance of negative values of type char in C

Question

chars in 'C' are used to represent characters.
Numbers representing characters in all code pages are always positive.

What is the use of having signed characters?? Are negative values contained in chars used only as integral values in a smaller integral data-type than int and short?? Do they have no other interpretation??(like positive values in chars representing characters)

score 12 · Answer 1 · edited Dec 21 '09 at 15:56

12

chars in 'C' are used to represent characters.

Not always, chars are used to represent bytes, they are the only type in c with a known size.

edited Dec 21 '09 at 15:56

Drew Dormann

59,987
13
123
180

answered Dec 21 '09 at 15:51

Martin Beckett

94,801
28
188
263

14

If you mean that they're the only type where sizeof(type) is guaranteed to be 1, yes. They're not guaranteed to be 8 bits. – David Thornley Dec 21 '09 at 16:01
3

The only thing C tells about the size of a char is that it is 1. It doesn't say what the units of 1 are. – Richard Pennington Dec 21 '09 at 16:02
3

PDP-10, for example, used 9-bit bytes. – el.pescado - нет войне Dec 21 '09 at 16:05
5

C89 and C99 standards do, however, guarantee char is at least 8 bits. Might be more, might be signed or unsigned. – Dec 21 '09 at 16:27
@David, yes the main use of char is to have a size of 1 byte when you are shipping binary objects around. Number of bits doesn't matter. – Martin Beckett Dec 21 '09 at 19:58
3

@DavidThornley They are guaranteed to be CHAR_BIT bits, and CHAR_BIT is kind of known. – kiwixz Mar 24 '16 at 23:10

score 5 · Answer 2 · answered Dec 21 '09 at 16:07

Only characters of the basic execution character set are guaranteed to be nonnegative (C99, 6.5.2 §3):

An object declared as type char is large enough to store any member of the basic execution character set. If a member of the basic execution character set is stored in a char object, its value is guaranteed to be nonnegative. If any other character is stored in a char object, the resulting value is implementation-defined but shall be within the range of values that can be represented in that type.

You have to discern between the 'plain' char type and the types signed char and unsigned char as well: signed char and unsigned char are ordinary integer types for which the following holds (C99, 6.5.2 §5):

An object declared as type signed char occupies the same amount of storage as a ‘‘plain’’ char object.

Further, if `char` is signed on your platform and you read a character with a codepoint greater than `CHAR_MAX` (say a character like æ in ISO-8859-1, which has codepoint `0xE6`), you're quite likely to get a negative char value. — caf, Dec 21 '09 at 23:10

score 4 · Answer 3 · answered Dec 21 '09 at 16:05

Numbers representing characters in all code pages are always positive.

Erm... wrong!?

From the C99 standard, emphasis mine:

If a member of the basic execution character set is stored in a char object, its value is guaranteed to be positive.

It is not guaranteed that all valid characters of all code page are positive. Whether char is signed or unsigned is implementation defined!

score 2 · Answer 4 · answered Dec 21 '09 at 15:59

From Jack Klein's Home Page:

Signed char can hold all values in the range of SCHAR_MIN to SCHAR_MAX, defined in limits.h. SCHAR_MIN must be -127 or less (more negative), and SCHAR_MAX must be 127 or greater. Note that many compilers for processors which use a 2's complement representation support SCHAR_MIN of -128, but this is not required by the standards.

From what I can tell, there's no official "meaning" of signed char. However, one thing to be aware of is that all the normal ASCII characters fall in the 0-127 range. Therefore, you can use the signed char type to restrict legal values to the 0-127 range, and define anything less than 0 as an error.

For example, if I had a function that searches some ASCII text and returns the most frequently occurring character, perhaps I might define a negative return value to mean that there are two or more characters tied for most frequent. This isn't necessarily a good way to do things, it's just an example off the top of my head.

Very simple, there are 3 types of `char`: `unsigned char`, `signed char` and `char`. The former two are explicit and used for manipulating the smallest numeric data type. However, `char`, is implementation defined whether it is signed or unsigned. In summary, when sign is significant, add the qualifier. — Thomas Matthews, Dec 21 '09 at 17:57

score 2 · Answer 5 · answered Dec 21 '09 at 16:37

Just beware of using plain chars as array indexes.

char buf[10000];
fgets(buf, sizeof buf, stdin);
unsigned charcount[UCHAR_MAX] = {0};
char *p = buf;
while (*p) {
    charcount[*p]++; /* if (*p < 0) BOOM! */
    // charcount[(unsigned char)*p]++;
    p++;
}

score 1 · Answer 6 · answered Dec 21 '09 at 16:49

1

It's worth noting that char is a distinct type from both signed char and unsigned char.

answered Dec 21 '09 at 16:49

el.pescado - нет войне

18,889
4
46
89

score 0 · Answer 7 · answered Dec 21 '09 at 16:00

0

In C and C++ chars can be signed or unsigned. A char variable can be used to hold a small integer value. This is useful for several reasons:

On small machines, e.g. an 8-bit micro. It might allow more efficient access and manipulation.
If you want to have a large array of small values, say 100K, you can save a bunch of memory by using an array of chars, rather than. e.g. ints.

In C, a character literal is an integer constant. '0' is equal to 48.

answered Dec 21 '09 at 16:00

Richard Pennington

19,673
4
43
72

Nit: `'0'` may or may not be 48. It's a small integer, able to fit in a `char`, but it need not be 48. – Alok Singhal Dec 21 '09 at 16:12
You got me. I forgot about EBCDIC. ;-) – Richard Pennington Dec 21 '09 at 16:24
1

Richard, there's always the DeathStation 9000 [ http://dialspace.dial.pipex.com/prod/dialspace/town/green/gfd34/art/ ] :) – pmg Dec 21 '09 at 17:56

score 0 · Answer 8 · answered Dec 21 '09 at 16:33

In C, a char (including signed char and unsigned char) is used to store a byte, which the C standard defines as a small integer at least 8 bits in size.

Having signed and unsigned bytes is as useful as having larger integers. If you're storing a very large number of small numbers (0..255 for unsigned, -127..127 for signed[1]) in an array, you may prefer to use bytes for them rather than, say, short ints, to save space.

Historically, a byte and a text character were pretty much the same thing. Then someone realized there are more languages than English. These days, text is much more complicated, but it is too late to change the name of the char type in C.

[1] -128..127 for machines with two's complement representation for negative numbers, but the C standard does not guarantee that.

Actually, der term "byte" is defined nowhere. In C/C++ chars are sequences of bits, and so is any object. But `int` is "the natural size suggested by the architecture of the execution environment" (C++11 §3.9.1/2). So the standard may define the term machine-word, but not machine-byte. `char` is not even the smallest addressable memory unit. For example, to define a 4-bit-char use `struct char4 { unsigned int c : 4; }` — Andreas Spindler, Oct 19 '12 at 08:25

significance of negative values of type char in C

8 Answers8