4

isspace() works if the input is representable as unsigned char or equal to EOF.

getchar() reads the next character from stdin.

When getchar()!=EOF; are all getchar() returned values representable as unsigned char?

uintmax_t count_space = 0;
for (int c; (c = getchar()) != EOF; )
  if (isspace(c))
    ++count_space;

May this code lead to the undefined behavior?

jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • All the ctype.h functions were designed so that they could deal with input that could be either a character or `EOF`. From the C rationale: "Since these functions are often used primarily as macros, their domain is restricted to the small positive integers representable in an unsigned char, plus the value of EOF." – Lundin Nov 06 '17 at 16:11
  • @Lundin `isspace(CHAR_MIN)` is undefined on some platforms ([follow the link in the question or in the answer below](https://stackoverflow.com/q/25776824/4279)). – jfs Nov 06 '17 at 18:37

2 Answers2

10

According to C11 WG14 draft version N1570:

§7.21.7.6/2 The getchar function is equivalent to getc with the argument stdin.

§7.21.7.5/2 The getc function is equivalent to fgetc...

§7.21.7.1/2 [!=EOF case] ...the fgetc function obtains that character as an unsigned char converted to an int...text in [...] is mine.

i.e.,

  • isspace() accepts getchar() values
  • all getchar()!=EOF values are representable as unsigned char
  • there is no undefined behavior here.

If you think it is too obvious ("what else can it be"), think again. For example, in the related case: isspace(CHAR_MIN) may be undefined i.e., it may be undefined behavior to pass a character to a character classification function!

If UCHAR_MAX > INT_MAX the result may be implementation-defined:

§6.3.1.3/3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.

Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • So come back to your point about `CHAR_MIN`. It seems to me that this is always undefined, because negative values can never be represented as `unsigned char`, only converted to them. The noticeble exception being the case were `CHAR_MIN` would be `EOF`. – Jens Gustedt Nov 05 '17 at 10:23
  • 1
    @JensGustedt CHAR_MIN may be 0 on some platforms. – jfs Nov 05 '17 at 10:30
  • Yes, but the interesting case being `CHAR_MIN` being negative. Or stated otherwise, all these functions are only defined for positive `char` values plus an exception for `EOF`. And all values of the execution character set have non-negative values. – Jens Gustedt Nov 05 '17 at 13:15
  • you said *"this is **always** undefined"* [emphasis mine]. I said: *"CHAR_MIN may be 0 on some platforms."* i.e., the behavior is NOT undefined on such platforms. – jfs Nov 05 '17 at 13:25
1

The return value of getchar() is of the same format as fgetc(). C11 defines the return value of fgetc() in 7.21.7.1p2-3:

  1. If the end-of-file indicator for the input stream pointed to by stream is not set and a next character is present, the fgetc function obtains that character as an unsigned char converted to an int and advances the associated file position indicator for the stream (if defined).

Returns

  1. If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end- of-file indicator for the stream is set and the fgetc function returns EOF. Otherwise, the fgetc function returns the next character from the input stream pointed to by stream. If a read error occurs, the error indicator for the stream is set and the fgetc function returns EOF. [289]

Since this is an unsigned char converted to an int, the int will almost always have the same value as the unsigned char.

It might not be true for high values on some platforms where sizeof(int) == 1; these however are mostly DSP platforms, so it is almost certain that character classification is not needed on these platforms.


The is* functions are carefully defined so that they can be used directly with the return value of *getc* C11 7.4p1:

1 The header <ctype.h> declares several functions useful for classifying and mapping characters. [198] In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.

i.e. it is legal to even pass EOF to the is* functions. Of course isanything(EOF) will always return 0, therefore to count continuous whitespace characters one could simply use something like:

while (isspace(getchar())) space_count ++;

However, signed char values are not OK, and for example MSVC C debug library is known to abort if a negative value other than EOF is passed in to any of the character classification functions.