13

Depending on the environment and compiler settings, the type char can be signed or unsigned by default, which means the range of values for single character constants on 8-bit 2s complement systems can be either -128..127 or 0..255.

In the ubiquitous ASCII character set, its ISO-8859-X extensions or the UTF-8 encoding, upper- and lowercase letters as well as digits have values below 127.

But such is not the case with the EBCDIC character set:

'A' is 0xC1, 'a' is 0x81 and '1' is 0xF1.

Since these value are above 127, does it mean the type char must be unsigned on 8-bit EBCDIC systems? Or can 'a', 'A' and '1' have negative values?

What about other character sets? Can the letters or digits ever have negative values?

chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • 3
    Possible duplicate of [Is char signed or unsigned by default?](https://stackoverflow.com/questions/2054939/is-char-signed-or-unsigned-by-default) – r3mainer Jul 15 '17 at 21:47
  • 6
    @squeamishossifrage: not a duplicate at all. I'm asking something much more precise than the question in reference. – chqrlie Jul 15 '17 at 21:57
  • chars are either signed or unsigned. If they're unsigned, then they're always positive. If not, then char values with the top bit set are negative. – r3mainer Jul 15 '17 at 21:59
  • 5
    @squeamishossifrage: I am well aware of this regrettable fact. I'm asking if we can always assume `'a' > 0`, especially on EBCDIC systems where `'a'` has the 8-bit encoding `1000 0001`. – chqrlie Jul 15 '17 at 22:02
  • 3
    @chqrlie "basic execution character .... guaranteed to be nonnegative.". was the cite I was looking for [here](https://stackoverflow.com/questions/45104764/relationship-between-char-and-ascii-code/45104867#comment77184210_45104867). Nice Q & A. – chux - Reinstate Monica Jul 15 '17 at 23:58
  • 3
    @chux: thank you for the suggestion. This question was too subtile for a Saturday evening I guess. – chqrlie Jul 16 '17 at 00:03
  • @Peter: the point of this question is exactly this! There is such a guarantee in the C Standard, and has been for a while. See the accepted answer for details. Also `char` values and character constants outside the execution character set should be cast to `(unsigned char)` when passed to `isalpha()`, `isalnum()`... because these functions have undefined behavior for negative values (except `EOF`). – chqrlie Jul 16 '17 at 09:05

1 Answers1

14

C99 states that:

6.2.5 Types

An object declared as type char is large enough to store any member of the basic execution character set.

If a member of the basic execution character set is stored in a char its value is guaranteed to be nonnegative.

Thus, if the machine in question uses EBCDIC encoding and 8-bit char, then the C99 compliant compiler designed for this machine must have plain char be unsigned.

Community
  • 1
  • 1
hidefromkgb
  • 5,834
  • 1
  • 13
  • 44
  • 2
    More precisely: if the machine in question has 8-bit chars and uses EBCDIC encoding, then the C99 compliant compiler designed for this machine must use unsigned char-s by default. – chqrlie Jul 15 '17 at 22:10
  • 1
    @chqrlie any system must offer `unsigned char`, you mean to say that plain `char` should be unsigned . Plain char is a distinct type from `unsigned char` so I think it is confusing to use the term `unsigned char` to mean plain char being unsigned. – M.M Jul 16 '17 at 02:58
  • also, C11 5.2.1/3 defines the *basic execution character set* to include `'a'` and `'0'` – M.M Jul 16 '17 at 02:59
  • FYI, very similar wording appeared in C90: "An object declared as type char is large enough to store any member of the basic execution character set. If a member of the required source character set enumerated in $2.2.1 is stored in a char object, its value is guaranteed to be positive." per https://webcache.googleusercontent.com/search?q=cache:sMhqRXLYNQ0J:flash-gordon.me.uk/ansi.c.txt+&cd=1&hl=en&ct=clnk&gl=us – zwol Jul 19 '17 at 02:46