The standard C library has various character type testing functions. They are declared in the #include <ctype.h>
header.
Unfortunately, the obvious way of using these functions is often wrong. They take an argument of type int
which is actually expected to be an unsigned character value (a byte, effectively) in the range 0 to UCHAR_MAX
. If you pass in a char
value which happens to be negative, undefined behavior ensues, which might work by coincidence, crash or worse yet form a vulnerability similar to heartbleed (possibly worse).
Therefore the cast to (unsigned char)
is quite likely necessary in the following:
#include <ctype.h>
/* ... */
char ch;
/* ... */
if (isalpha((unsigned char) ch) || ch == ' ') {
/* ch is an alphabetic character, or a space */
}
Simple character constants (not numeric escaped ones) derived from the C translation time character set have positive values in the execution environment; code which can safely assume that it only manipulates such characters can do without the cast. (For instance, if all the data being manipulated by the program came from string or character literals in the program itself, and all those literals use nothing but the basic C translation time character set.)
That is to say, isalpha('a')
is safe; a
is in the C translation time character set, and so the value of the character constant 'a'
is positive. But say you're working with source code in ISO-8859-1 and have char ch = 'à';
. If char
is signed, this ch
will have a negative value, which is fine according to ISO C because an accented à isn't in the basic C translation character set. The expression isalpha(ch);
then passes a negative value to the isalpha
function, which is wrong.