2

So I realize that assuming ascii encoding can get you in trouble, but I'm never really sure how much trouble you can have subtracting characters. I'd like to know what relatively common scenarios can cause any of the following to evaluate to false.

Given:

std::string test = "B";
char m = 'M';

A) (m-'A')==12

B) (test[0]-'D') == -2

Also, does the answer change for lowercase values (changing the 77 to 109 ofc)?

Edit: Digit subtraction answers this question for char digits, by saying the standard says '2'-'0'==2 must hold for all digits 0-9, but I want to know if it holds for a-z and A-Z, which section 2.3 of the standard is unclear on in my reading.

Edit 2: Removed ASCII specific content, to focus question more clearly (sorry @πάντα-ῥεῖ for a content changing edit, but I feel it is necessary). Essentially the standard seems to imply some ordering of characters for the basic set, but some encodings do not maintain that ordering, so what's the overriding principle?

Community
  • 1
  • 1
Adam Martin
  • 1,188
  • 1
  • 11
  • 24
  • Downvotes/close votes care to comment? http://stackoverflow.com/questions/36310181/char-subtraction-in-c Is not a duplicate imho because it quotes the standard as holding for digits only. – Adam Martin Apr 18 '16 at 22:14
  • What if `char` is unsigned? – David Schwartz Apr 18 '16 at 22:52
  • @DavidSchwartz Good point but my examples are all signed. – Adam Martin Apr 18 '16 at 22:55
  • 1
    What do you think will happen if you do `(test[0]-'D')` on a platform where `char` is unsigned? – David Schwartz Apr 18 '16 at 22:58
  • @DavidSchwartz Did not realize that char could be unsigned w/o being declared as `unsigned char`. In which case B) could be problematic. Good to know! (For others like me: http://stackoverflow.com/questions/17097537/why-is-char-signed-by-default-in-c) – Adam Martin Apr 18 '16 at 23:01
  • @DavidSchwartz it will still work if `char` is `unsigned`, because both `'B' - 'D'` and `-2` convert to unsigned using the same rules – M.M Apr 18 '16 at 23:19
  • @M.M Why would the -2 convert to unsigned? – David Schwartz Apr 19 '16 at 05:39
  • @DavidSchwartz In fact it wouldn't. `'B' and `D` promote to `int` and then the int subtraction produces `-2`. – M.M Apr 19 '16 at 05:45

1 Answers1

1

In other words, when are chars in C/C++ not stored in ASCII?

C or C++ language don't have any notion of the actual character coding table used by the target system. The only convention is that character literals like 'A' match the current encoding.

You could as well deal with EBCDIC encoded characters and the code looks the same as for ASCII characters.

πάντα ῥεῖ
  • 1
  • 13
  • 116
  • 190
  • So there is no guarantee like with digits that the basic alphabet will be sequential? I.e. the standard guarantees `'2'-'0'==2`, but there is no guarantee for `'C'-'A'`? – Adam Martin Apr 18 '16 at 22:19
  • @AdamMartin Well, these should be guaranteed with a sane character encoding table, something like `m==77` isn't. – πάντα ῥεῖ Apr 18 '16 at 22:21
  • 1
    I realize that `m==77` is not going to hold, but in EBCDIC for example `A-Z` are not sequential. – Adam Martin Apr 18 '16 at 22:22
  • I should not have edited in the line you quoted about ASCII, was trying to boil down the question but I overdid it since I didn't know the answer/material well enough. – Adam Martin Apr 18 '16 at 22:27
  • @AdamMartin you answered your own question in that comment. – M.M Apr 18 '16 at 23:20
  • @M.M So even though they're sequential in the common character set they will be encoded in the system encoding which would not necessarily have them in order? – Adam Martin Apr 18 '16 at 23:37
  • The *execution character set* is what matters – M.M Apr 19 '16 at 05:34