43

I was writing this answer and I quoted from http://en.cppreference.com/w/cpp/string/byte/tolower#Parameters

Is not representable as unsigned char and does not equal EOF, the behavior is undefined

When I went to inspect the edit that had added this phrase I found that the author's comment:

Can't use negative signed chars with any ctype.h function per C99 7.4/1

The author is citing from the C99 standard in C++ documentation. Is that valid? I couldn't find anything on the definition of this function in the C++ standard, so I must assume that it is valid.

But this concerns me for 2 reasons:

  1. How would I know what version of the C standard the C++ standard depends upon?
  2. There are lists of the discrepancies between C and C++ everywhere. If I'm looking at the C standard with reference to C++ how could I possibly know whether the area I'm looking at has been overridden?
Community
  • 1
  • 1
Jonathan Mee
  • 37,899
  • 23
  • 129
  • 288
  • 1
    FWIW, the first description, "Is not representable as unsigned char ..." is a pretty close paraphrase of the C99 and C11 requirements. The author's comment "Can't use negative unsigned chars..." is wrong (not to mention self-contradictory). Sorry about confusing things; I didn't read your question carefully enough the first time around. – Pete Becker Jun 03 '16 at 12:58
  • Whoops, misquoted the second comment: it's "negative signed chars", not "negative unsigned chars". Nevertheless, "negative signed chars" is simply wrong, both because negative values that are not representable as signed chars are also prohibited, and because `EOF`, which may well be representable as a negative signed char, is allowed. – Pete Becker Jun 03 '16 at 13:43
  • related, if not even a duplicate: http://stackoverflow.com/a/34308279/2003898 – dhein Jun 03 '16 at 14:44
  • @PeteBecker that was my commit comment, I was referring to \xb4 that I changed to unsigned with it (while also adding the mention of UB). But yes, it wasn't sufficiently pedantic. – Cubbi Jun 03 '16 at 19:02

3 Answers3

36

For the first question:

The C++ standard explicitly lists the C standard(s) on which it depends in its Normative references section. For C++14, [intro.refs] 1.2/1 happens to list C 99:

  • ISO/IEC 9899:1999, Programming languages — C
  • ISO/IEC 9899:1999/Cor.1:2001(E), Programming languages — C, Technical Corrigendum 1
  • ISO/IEC 9899:1999/Cor.2:2004(E), Programming languages — C, Technical Corrigendum 2
  • ISO/IEC 9899:1999/Cor.3:2007(E), Programming languages — C, Technical Corrigendum 3

For the second question:

The C++ standard does not implicitly incorporate any parts of the C standard; all references to the C standard are explicit. A good source of information on where C++ deviates from C is Annex C, "Compatibility" of the C++ standard, particularly C.1 [diff.iso].

Additionally, references to the C standard library are scattered throughout the description of the C++ standard library (chapters 17–30 in C++14). Of particular interest can be:

  • 17.2 [library.c], which describes the basic inclusion of the C standard library
  • Chapter 18 [language.support], which describes many of the <c:::> headers of the C++ standard library (those which offer the C standard library functionality).
Angew is no longer proud of SO
  • 167,307
  • 17
  • 350
  • 455
14

How would I know what version of the C standard the C++ standard depends upon?

In C++ 14, it's ISO/IEC 9899:1999 (plus three corrigendums, so C99 in essence) as stated in 1.2 [intro.refs] in N4140. In C++98, it was C90, in C++17, it probably will be C11, but the C++ standard will always make that explicit.

If I'm looking at the C standard with reference to C++ how could I possibly know whether the area I'm looking at has been overridden?

You look in the C++ standard, it either explicitly imports the C definitions minus restrict or any C behavior it wants or makes explicit modifications.

Usually, reading good documentation instead of the standard itself will serve you just fine.


To address your initial question:

The author is citing from the C99 standard in C++ documentation. Is that valid?

Yes, because

1 Tables 74 [contains std::tolower, me], 75, 76, 77, 78, and 79 describe headers <cctype>, <cwctype>, <cstring>, <cwchar>, <cstdlib> (character conversions), and <cuchar>, respectively.
2 The contents of these headers shall be the same as the Standard C Library headers <ctype.h>, <wctype.h>, <string.h>, <wchar.h>, and <stdlib.h> and the C Unicode TR header <uchar.h>, respectively, with the following modifications [none of those apply to std::tolower, me]:

21.8 [c.strings] in N4140

Baum mit Augen
  • 49,044
  • 25
  • 144
  • 182
  • 2
    These are very delicate matters. Please, how can one distinguish "reading **good** documentation instead of the standard itself" unless being a member of the standards committee oneself, or at least a respected author with few dozen million lines of successful code behind? Thanks, @Baum mit Augen – user3078414 Jun 03 '16 at 13:18
  • 7
    @user3078414 My rule of thumb: If I need to look into the standard itself to find out whether or not my code is valid, it's too complicated or smart. I mostly need the standard for stuff I cannot control, like checking the correctness of documentation on cppr (rare), or verifying some behavior in *"Just for Fun"* situations like SO and bug reports. For real code, documentation is your friend. cppr is mostly correct and cites it's sources in the standards. – Baum mit Augen Jun 03 '16 at 13:22
  • 5
    For SO question tagged language-lawyer, only the standard itself counts of course, but that's one of those just for fun things. – Baum mit Augen Jun 03 '16 at 13:26
3

The edit is correct and this particular text has been in the standard since C90.

From C90 4.3

The header declares several functions useful for testing and mapping characters. In all cases the argument is an int , the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF . If the argument has any other value, the behavior is undefined.

From C11 7.4/1

The header declares several functions useful for classifying and mapping characters. In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.

Identical text; C has always been like this. So it doesn't matter which C version your particular C++ version uses, because all C versions are equivalent.

Lundin
  • 195,001
  • 40
  • 254
  • 396