7

The book The C Programming Language by Kernighan and Ritchie, second edition states on page 43 in the chapter about Type Conversions:

Another example of char to int conversion is the function lower, which maps a single character to lower case for the ASCII character set. If the character is not an upper case letter, lower returns returns it unchanged.

/* lower: convert c to lower case; ASCII only */
int lower(int c)
{
    if (c >= 'A' && c <= 'Z')
        return c + 'a' - 'A';
    else
        return c;
}

It isn't mentioned explicitly in the text so I'd like to make sure I understand it correctly: The conversion happens when you call the lower function with a variable of type char, doesn't it? Especially, the expression

c >= 'A'

has nothing to do with a conversion from int to char since a character constant like 'A' is handled as an int internally from the start, isn't it? Edit: Or is this different (e.g. a character constant being treated as a char) for ANSI C, which the book covers?

Community
  • 1
  • 1
efie
  • 544
  • 5
  • 22

3 Answers3

4

Character constants have type int, as you expected, so you are correct that there are no promotions to int in this function.

Any promotion that may occur would happen if a variable of type char is passed to this function, and this is most likely what the text is referring to.

The type of character constants is int in both the current C17 standard (section 6.4.4.4p10):

An integer character constant has type int

And in the C89 / ANSI C standard (section 3.1.3.4 under Semantics):

An integer character constant has type int

The latter of which is what K&R Second Edition refers to.

dbush
  • 205,898
  • 23
  • 218
  • 273
  • 1
    K&R precedes the C standard. Is there reason to believe that, in their text, the type of `'a'` was `int` and not `char`? – Eric Postpischil Dec 25 '18 at 23:46
  • 1
    @EricPostpischil On page 37 the book says: "A *character constant* is an integer, written as one character within single quotes, such as `'x'`". I'm not sure if "is an integer" means "is of type `int`"? – efie Dec 25 '18 at 23:53
  • 2
    @efie: `char`, `short`, `int`, and `long` are all integer types, and an integer is just a value, so I do not think their reference to a character as an integer is informative. – Eric Postpischil Dec 25 '18 at 23:54
  • That makes sense, thanks. Regarding your first comment: On page two the book says: "The second edition of *The C Programming Language* describes C as defined by the ANSI standard." – efie Dec 26 '18 at 00:02
4

K&R C is old. Really old. Many particulars of K&R C are no longer true in up-to-date standard C.

In stadard, up-to-date C11, there is no conversion to/from char in the function you posted:

/* lower: convert c to lower case; ASCII only */
int lower(int c)
{
    if (c >= 'A' && c <= 'Z')
        return c + 'a' - 'A';
    else
        return c;
}

The function accepts int arguments as int c, and per 6.4.4.4 Character constants of the C standard, character literals are of type int.

Thus the entire lower function, as posted, under C11 deals entirely with int values.

The conversion, if any, is may be done when the function is called:

char upperA = 'A`;

// this will implicitly promote the upperA char
// value to an int value
char lowerA = lower( upperA );

Note that this is one of the differences between C and C++. In C++, character literals are of type char, not int.

Andrew Henle
  • 32,625
  • 3
  • 24
  • 56
2

How exactly is this function an example of a char to int conversion?

/* lower: convert c to lower case; ASCII only */
int lower(int c) {
    if (c >= 'A' && c <= 'Z')
        return c + 'a' - 'A';
    else
        return c;
}

It is not an example of a char to int conversion - technically incorrect by the author.


The text goes on to discuss tolower(c) as an alternative to lower() as it "works" correctly even if [A -Z] are not consecutively encoded as in EBCDIC.

What is not discussed, is that tolower() functions and others (is...()) are only specified for int values in the unsigned char range and EOF. C11 §7.4 1. Other values invoke undefined behavior (UB).

It is this requirement that makes these Standard C library functions conceptually char to int conversions as only values in the (about) char range are specified and the result is int.


Now look at code where char conversion does occur.

void my_strtolower1(char *s) {
  while (*s) {
    *s = lower(*s);  // conversion `char` to `int` and `int` to `char`.
    s++;
  }
} 

void my_strtolower2(char *s) {
  while (*s) {
    *s = tolower(*s); // conversion `char` to `int` and `int` to `char`.
    s++;
  }
} 

void my_strtolower3(char *s) {
  while (*s) {
    // conversion `char` to `unsigned char` to `int` and `int` to `char`.
    *s = tolower((unsigned char) *s); 
    s++;
  }
} 

my_strtolower1() well defined, yet not functionally correct on rare machines where [A-Z,a-z] are not consecutive.

my_strtolower2() expected functionality except technically undefined behavior when *s < 0 (and not EOF).

my_strtolower3() expected functionality without UB when *s < 0.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256