22

Must a C++ implementation set the chars '0'-'9' to have contiguous numeric values, i.e. so that:

'0' -> 0+n
'1' -> 1+n
 m  -> m+n
'9' -> 9+n

I cannot find it mentioned in the documentation of isdigit ([classification] (22.3.3.1 Character classification)) *, nor can I find it in the locale documentation (but maybe I did not look hard enough).

In 2.3 Character sets, we find that

The basic source character set consists of 96 characters: the space character, the control characters representing horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphical characters

But it doesn't mention any ordering (but maybe I did not look hard enough).


*: Interesting footnote there:

When used in a loop, it is faster to cache the ctype<> facet and use it directly [instead of isdigit() et al, end comment], or use the vector form of ctype<>::is.

Rohit Vipin Mathews
  • 11,629
  • 15
  • 57
  • 112
Sebastian Mach
  • 38,570
  • 8
  • 95
  • 130
  • 10
    Why the vote-for-close: `This question is not a good fit to our Q&A format. We expect answers to generally involve facts, references, or specific expertise; this question will likely solicit opinion, debate, arguments, polling, or extended discussion.` I have facts, references, specific expertise, and the answer will probably not involve solicit opinion, debate, argument, polling, but prolly a reference into the standard, so no extended discussion either? Is someone high of mod-powers? – Sebastian Mach Feb 23 '12 at 16:26
  • It's not in the locale stuff, because that has to deal with other digits too. (E.g. `Ⅿ` ;) ) – MSalters Feb 24 '12 at 10:04

1 Answers1

25

Indeed not looked hard enough: In 2.3. Character sets, item 3:

In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous.

And this is above list of decimal digits:

0 1 2 3 4 5 6 7 8 9

Therefore, an implementation must use a character set where the decimal digits have a contiguous representation. Thus, optimizations where you rely on this property are safe; however, optimizations where you rely on the coniguity of other digits (e.g. 'a'..'z') are not portable w.r.t. to the standard (see also header <cctype>). If you do this, make sure to assert that property.

Sebastian Mach
  • 38,570
  • 8
  • 95
  • 130
  • Thanks @cHao for the hint. Astonishing. – Sebastian Mach Feb 23 '12 at 16:22
  • 1
    As it happens, both ASCII (and its derivatives) and EBCDIC assign contiguous values to the decimal digits. ASCII makes the lowercase letters contiguous, as well as the uppercase letters; EBCDIC does not. That's probably why C and C++ require consecutive digits, but not consecutive letters. The vast majority of C++ implementations use ASCII or one of its derivatives (Latin-1, Windows-1252, Unicode, etc.); the vast majority of the rest use EBCDIC. – Keith Thompson Feb 24 '12 at 06:50
  • @CodingMastero: I usually wait some days to encourage more answers. Maybe someone provides some historical background besides the references :) – Sebastian Mach Feb 24 '12 at 11:00
  • its you who have asked and answered too. Then What more you need? – Rohit Vipin Mathews Feb 24 '12 at 11:03
  • @CodingMastero: True, but often enough, some answerers provide additional information and insight. I didn't want to discourage anyone from posting. However, the time buffer is over and I accepted. – Sebastian Mach Feb 29 '12 at 09:27
  • If ISO C also has the same guarantee, could you mention that in this answer? It came up when I googled for `C digits contiguous`. Update: it does, [Why does subtracting '0' in C result in the number that the char is representing?](https://stackoverflow.com/a/15598759) – Peter Cordes Apr 29 '21 at 01:24