2

Forgive me for the stupid question. But I was wondering what character set C's char type uses. At first I thought it was ASCII, but then I realized it could reach 255 which exceeds ASCII's 127 characters. What character set is this? Extended ASCII?

  • You're almost certainly encountering [UTF-8](https://en.wikipedia.org/wiki/UTF-8). – Chris Oct 20 '21 at 22:59
  • 2
    This question does not make sense. The `char` data type can only represent numbers. How these numbers are interpreted is a matter of your program, or, if you print them to a terminal/console, then it is a matter of which character set the operating system is using. – Andreas Wenzel Oct 20 '21 at 23:15
  • 4
    @Andreas: Values of the `unsigned char` type are not just numbers in C. The C standard includes various semantics that interpret them as characters, including some requirements on the execution character set (including consecutive codes for digit characters), classifications in ``, various meanings in `strtod`, and more. – Eric Postpischil Oct 20 '21 at 23:23
  • duplicates: [What character set does C's "char" use?](https://stackoverflow.com/q/32940276/995714), [C standard : Character set and string encoding specification](https://stackoverflow.com/q/12204453/995714), [What is the default encoding for C strings?](https://stackoverflow.com/q/3996026/995714) – phuclv Oct 21 '21 at 00:05

1 Answers1

9

The C standard does not require C implementations to use a particular character set. It requires the execution character set (used in running programs, in contrast to the source character set used when compiling) to have the Latin alphabet letters A-Z and a-z, the digits 0-9, these characters:

!"#%&’()*+,-./:;?[\]^_{|}~

the space character, and characters for horizontal tab, vertical tab, form feed, alert, backspace, carriage return, and new line. It requires the codes for the digits to be consecutive from the code for 0 to the code for 9, and the character value zero must be available to mark the end of strings. Otherwise, it leaves the character set up to each C implementation.

C implementations overwhelmingly use ASCII with the character codes 0-127. There may be somewhat more variation in what implementations use with codes 128-255.

Andreas Wenzel
  • 22,760
  • 4
  • 24
  • 39
Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • The `char` type which the question is about does not necessarily support codes 128 and larger, because it has implementation-defined signedness. And the signedness isn't necessarily picked to suit a certain symbol table. The largest value a `char` can hold, portably, is 127. – Lundin Oct 21 '21 at 06:59