11

Why character functions accept int argument instead of char argument?

<ctype.h>

int isalnum(int c); 
int isalpha(int c); 
int iscntrl(int c); 
int isdigit(int c); 
int isgraph(int c); 
int islower(int c); 
int isprint(int c); 
int ispunct(int c); 
int isspace(int c); 
int isupper(int c); 
int isxdigit(int c); 
int tolower(int c); 
int toupper(int c); 
Amir Saniyan
  • 13,014
  • 20
  • 92
  • 137
  • I suspect the answer is similar to the one given [here](http://stackoverflow.com/questions/433895/why-are-c-character-literals-ints-instead-of-chars). In C, character literals are of type `int`. – Cody Gray - on strike Feb 16 '12 at 07:42
  • @Cody: the two decisions may be related, in that the correct datatype for doing "calculations" on characters in C is `int`. But literals having the same type as these functions' parameters isn't as simple as it looks. You can write `isalnum('a')`, but you are not guaranteed to be able to write `isalnum(CHAR_MIN)`, or whatever character literal corresponds to `CHAR_MIN` in your implementation, because it might be negative. To match up properly with these functions, character literals really would need type `unsigned`, but then casting them to `char` would be potentially bad. – Steve Jessop Feb 16 '12 at 08:38

4 Answers4

12

Characters and integers are rather tightly knit in C.

When you receive a character from an input stream, it must be able to represent every single character plus the end-of-file symbol.

That means a char type won't be big enough so they use a wider type.

The C99 rationale document states:

Since these functions are often used primarily as macros, their domain is restricted to the small positive integers representable in an unsigned char, plus the value of EOF. EOF is traditionally -1, but may be any negative integer, and hence distinguishable from any valid character code. These macros may thus be efficiently implemented by using the argument as an index into a small array of attributes.

The standard itself has this to say:

The header <ctype.h> declares several functions useful for classifying and mapping characters. In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • 1
    "The next biggest type" would actually be `short`. But, when these were invented a `short` would be promoted to `int` just like a `char` would. – Jerry Coffin Feb 16 '12 at 07:48
  • @JerryCoffin What do you mean? All shorts are still promoted to ints, whenever they are used in an expression. – Lundin Feb 16 '12 at 07:55
  • @AmirSaniyan: while ((mychar = tolower(getchar()) != EOF) { /* do stuff */ } Technically, non-ascii values return undefined values, but when has "this behavior is undefined" ever stopped anyone from relying on it? – tbert Feb 16 '12 at 08:07
  • @Jerry: more importantly, when these were invented `short` was the same size as `char` in many places. Although the standard doesn't explicitly require it, `int` is larger than `char` pretty much everywhere. If they were the same size, the implementer would need a special "reserved" negative value, that isn't a code point in the execution character set and can never be read from any kind of input (including e.g. a binary file stream), and use that as EOF. I'm not sure that's legal, since it would mean there's a value of `char` that cannot be written to a file and read back. – Steve Jessop Feb 16 '12 at 08:29
  • @SteveJessop: what compiler had char and short the same size? I'm pretty sure none of the AT&T compilers, nor Whitesmiths did. I can remember a fair number of really early compilers (e.g., BDS C) that didn't have `short` at all, but none that had it the same size as `char`. – Jerry Coffin Feb 16 '12 at 15:07
  • @Jerry: sorry, I was confused. For some reason I was thinking that `short` was often 8 bits on 8-bit and 16-bit systems, but of course that wouldn't conform. Thinking about it, I'm not sure I've ever actually used `short` myself. – Steve Jessop Feb 16 '12 at 15:12
5

When C was first invented, there was no compile-time checking of function arguments. If one called foo(bar,boz), and bar and boz were of type int, the compiler would push two int values on the stack, call foo, and hope it was expecting to get two int values. Since integer types smaller than int are promoted to int when evaluating expressions, C functions which were written prior to the invention of prototypes could not pass any smaller integer type.

supercat
  • 77,689
  • 9
  • 166
  • 211
3

They have to accept EOF in addition to normal character values. They also predate the invention of function prototypes. At that time, there was no way to pass a char to a function -- it was always promoted to int first.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
0

Yes, it could be to accommodate EOF which is always a non-char value, though the exact value of EOF could vary with different systems but it'll never be same as any character code.

Gargi Srinivas
  • 929
  • 6
  • 6