0

Is there a way to have the decimal value (in int) of ASCII / extended ASCII characters in C (especially the extended ones)

ASCII & extended ASCII table : http://www.theasciicode.com.ar/

Example of my problem with some code :

int a = (int) 'a';
int b = (int) '│';

printf("%i\n", a);
printf("%i\n", b);

and the output is :

97
14849154

in the ASCII table, "│" is normally 179.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
Fire Frost
  • 95
  • 1
  • 14

1 Answers1

3

OP' platform is using implementation defined behavior concerning string literals outside the basic coding set.

UTF-8 encoding. The '│' is a Unicode character U+2502

When coded as a UTF-8, it has the 3-byte sequence 0xE2 0x94 0x82 or in big endian order: 0xE29482 which is 14849154 (decimal) as printed out by OP.

 int b = (int) '│';

Note: ASCII is only defined for codes 0 to 127.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • Yes, but you still have to wonder what the source charset actually is, what the compiler is being told it is (and what the compiler is being told to use as the execution charset). It is possible that they are all intentionally UTF-8. It is also possible that the actual source charset is UTF-8 but the compiler is being told differently. Overall, it seems like UTF-8 is a mistake in this project. – Tom Blodget Oct 13 '17 at 19:35
  • @tom Concerning "the actual source charset is UTF-8", [What's the difference between encoding and charset?](https://stackoverflow.com/q/2281646/2410359) may be useful. I would say [UTF-8](https://en.wikipedia.org/wiki/UTF-8) is an encoding and not a [charset](https://en.wikipedia.org/wiki/Character_encoding). – chux - Reinstate Monica Oct 13 '17 at 19:43
  • Agreed but charset is the term used in various compiler arguments for encoding. – Tom Blodget Oct 13 '17 at 19:45