8

I am using gcc (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1

The man page for isalnum() says:

SYNOPSIS
       #include <ctype.h>

       int isalnum(int c);

However, it also says:

These functions check whether c, which must have the value of an unsigned char or EOF, ...

I have found that isalnum() will blow up for very large positive (or negative) int values (but it handles all short int values).

Is the man page saying the int passed in must have a value of an unsigned char because the C library writers are reserving the right to implement isalnum() in a way that will not handle all int values without blowing up?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Scooter
  • 6,802
  • 8
  • 41
  • 64

1 Answers1

7

The C standard says as much...

In ISO/IEC 9899:1999 (the old C standard), it says:

§7.4 Character handling

The header declares several functions useful for classifying and mapping characters. In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.

(I've left out a footnote.) Both C89 and C11 say very much the same thing.

One common implementation is to use an array offset by 1 — a variation on the theme of:

int _CtypeBits[257] = { ... };

#define isalpha(c)  (_Ctype_bits[(c)+1]&_ALPHA);

As long as c is in the range of integers that an unsigned char can store (and there are 8 bits per character, EOF is -1, and the initialization is correct), then this works beautifully. Note that the macro expansion only uses the argument once, which is another requirement of the standard. But if you pass random values out the stipulated range, you access random memory (or, at the least, memory that is not initialized to contain the correct information).

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • 1
    A better implementation is `#define isalpha(c) (((unsigned)(c)|32)-'a'<26)`. :-) – R.. GitHub STOP HELPING ICE Jul 24 '12 at 04:25
  • 1
    Thanks Jonathon! Seems like they should have an additional disclaimer if they want to implement it as in your example and be accurate: ",and EOF must be defined as -1". – Scooter Jul 24 '12 at 04:54
  • 3
    @R.: Except that only works in the C locale, whereas with Jonathan's method it's trivial to switch out the array depending on the current locale. But of course I know that you know that ;) – caf Jul 24 '12 at 05:17
  • Actually it also works in any UTF-8 locale, since the first 128 bytes match ASCII and the rest do not represent characters by themselves (thus requiring the `isw*` functions). – R.. GitHub STOP HELPING ICE Jul 24 '12 at 14:17