5

Quote from C++03 2.2 Character sets:

"The basic execution character set and the basic execution wide-character set shall each contain all the members of the basic source character set..The values of the members of the execution character sets are implementation-defined, and any additional members are locale-specific."

According to this, 'A', which belongs to the execution character set, its value is implementation-defined. So it's not 65(ASCII code of 'A' in decimal), what?!

// Not always 65?
printf ("%d", 'A');

Or I've a misunderstanding as to the value of a character in execution character set?

Eric Z
  • 14,327
  • 7
  • 45
  • 69
  • 2
    So, if you are running on a machine that is using EBCDIC, you expect `char c = 'A'; cout << c << endl;' to output an A or something else? In EBCDIC, 'A' has the value 193. – Mats Petersson May 02 '13 at 13:41
  • 2
    Just curious, but has anyone seen anything other than EBCDIC or an extension of ASCII? In C or C++: I'm familiar with other encodings which were used before C came along, but I don't think that there was ever a C compiler which used them. (Most of the earliest encodings didn't distiguish upper and lower case, so they could be on 6 bits.) – James Kanze May 02 '13 at 13:44
  • 4
    The value of `'A'` is `'A'`. – Kerrek SB May 02 '13 at 13:46
  • @KerrekSB: What I mean by value is the binary representation of 'A' on a machine that only has '0' and '1'. It should clarify in the context. – Eric Z May 02 '13 at 14:21
  • @Mats, oh, I'm just not aware of the existence of other encodings separately developed w/ ASCII. Thanks. – Eric Z May 02 '13 at 14:25
  • @James Kanze, I think that's the main reason why some developers like me thought ASCII is the only standardized encoding out there.. – Eric Z May 02 '13 at 14:28
  • @EricZ Come now. It's been decades since I've seen ASCII. It's mostly UTF-8 today, and before that, ISO 8859-1. (But both of these have the same code points as ASCII for the first 128 codepoints.) – James Kanze May 02 '13 at 14:34

2 Answers2

7

Of course it can be ASCII's 65, if the execution character set is ASCII or a superset (such as UTF-8).

It doesn't say "it can't be ASCII", it says that it is something called "the execution character set".

unwind
  • 391,730
  • 64
  • 469
  • 606
  • 1
    so I guess comparing 'A' w/ 65 or write its ASCII code to a binary file is neither portable, right? – Eric Z May 02 '13 at 14:16
  • If you have a file containing binary data in "raw" form, then it's not very portable anyway. You need to know endianness and other such things. If you have a textfile that is ASCII and want to use it in EBCDIC, there are translation programs (The unix/linux `dd` for example). – Mats Petersson May 02 '13 at 14:28
  • @Mats, Endianness is better known to developers than ASCII counterparts like EBCDIC, I think;) – Eric Z May 02 '13 at 14:32
  • Yes, because it's a more commonly encountered problem. And I must say I haven't ever used a machine with EBCDIC (to my knowledge at least) – Mats Petersson May 02 '13 at 14:37
  • @EricZ Unless they work on mainframes. IBM mainframes still use EBCDIC. (I actually had to deal with data coming in in EBCDIC a little over 10 years ago; from what I understand, the situation hasn't changed since then. And 35-40 years ago, several other encodings.) – James Kanze May 02 '13 at 14:40
1

So, the standard allows that the "execution character set" is other things than ASCII or ASCII derivatives. One example would be the EBCDIC character set that IBM used for a long time (there's probably still machines about using EBCDIC, but I suspect anything built in the last 10-15 years wouldn't be using that). The encoding of characters in EBCDIC is completely different from ASCII.

So, expecting, in code, that the value of 'A' is any particular value is not portable. There are also a whole heap of other "common assumptions" that will fail - that there are no "holes" between A-Z, and that 'A'-'a' == 32 are both false in EBCDIC. At least the characters A-Z are in the correct order! ;)

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
  • IBM mainframces still use EBCDIC today. (And FWIW: all that C and C++ require is that the 10 digits be consecutive and in order. The alphabet can be all over the place. The native collating sequence will be rather counter-intuitive if it were, but then, it's already counter-intuitive to have all of the upper collate before any lower, and to have a few odd punctuation between them. Not to mention what happens with accented characters in ISO 8859-1.) – James Kanze May 02 '13 at 14:43
  • Yes, but how many IBM mainframes are being produced per year these days? – Mats Petersson May 02 '13 at 14:49
  • As many as there always were? There have never been large numbers of mainframes; at the beginning, there weren't large numbers of anything else either. But I know a couple of places which still do most data processing on mainframes. For some types of work, it's still the preferred solution. – James Kanze May 02 '13 at 15:32
  • Yes, and of course the fact that the Cobol program to process it was written in 1983, and hasn't been much altered since means that moving to something else is a big effort ... ;) – Mats Petersson May 02 '13 at 16:41
  • New Cobol programs are also still being written. But I suspect that there are jobs for which a mainframe is the most appropriate answer. – James Kanze May 02 '13 at 17:03