9

Trying to create a simple function that would look for a single char in a string "like strchr() would", i did the following:

char* findchar(char* str, char c)
{
    char* position = NULL;
    int i = 0;
    for(i = 0; str[i]!='\0';i++)
    {
        if(str[i] == c)
        {
            position = &str[i];
            break;
        }
    }
    return position;
}

So far it works. However, when i looked at the prototype of strchr():

char *strchr(const char *str, int c);

The second parameter is an int? I'm curious to know.. Why not a char? Does this mean that we can use int for storing characters just like we use a char?

Which brings me to the second question, i tried to change my function to accept an int as a second parameter... but i'm not sure if it's correct and safe to do the following:

char* findchar(char* str, int c)
{
    char* position = NULL;
    int i = 0;
    for(i = 0; str[i]!='\0';i++)
    {
        if(str[i] == c) //Specifically, is this line correct? Can we test an int against a char? 
        {
            position = &str[i];
            break;
        }
    }
    return position;
}
360NS
  • 176
  • 1
  • 10
  • 4
    The second argument is an `int` for reasons of backwards compatibility between the old pre-standard code for `strchr()` and the C89/C90 standard version. The standard says: _The `strchr` function locates the first occurrence of `c` (converted to a `char`) in the string pointed to by `s`._. When searching for equality, it doesn't matter much whether it is converted to plain `char` (which may be signed or unsigned) or it is converted to `unsigned char`; the rules for `strcmp()` are different — it interprets the values as `unsigned char` because it must order them correctly. – Jonathan Leffler Feb 03 '17 at 07:36
  • @JonathanLeffler Thanks for your reply, if i have understood correctly, the second int parameter in strchr() gets converted to char before actually looking for an occurence? – 360NS Feb 03 '17 at 07:53
  • 1
    That's what the standard says. It means that if you pass a value outside the range `CHAR_MIN .. CHAR_MAX`, then the value is truncated to a `char`. Then the comparison expression inside the function converts both the element of the string and the `char` back to `int` again because that always happens. – Jonathan Leffler Feb 03 '17 at 07:55

2 Answers2

6

Before ANSI C89, functions were declared without prototypes. The declaration for strchr looked like this back then:

char *strchr();

That's it. No parameters are declared at all. Instead, there were these simple rules:

  • all pointers are passed as parameters as-is
  • all integer values of a smaller range than int are converted to int
  • all floating point values are converted to double

So when you called strchr, what really happened was:

strchr(str, (int)chr);

When ANSI C89 was introduced, it had to maintain backwards compatibility. Therefore it defined the prototype of strchr as:

char *strchr(const char *str, int chr);

This preserves the exact behavior of the above sample call, including the conversion to int. This is important since an implementation may define that passing a char argument works differently than passing an int argument, which makes sense on 8 bit platforms.

Roland Illig
  • 40,703
  • 10
  • 88
  • 121
  • 2
    Do you have any source for this? It is not mentioned in the C rationale. – Lundin Feb 03 '17 at 08:40
  • 1
    On the contrary, 3.7.1 of the C rationale seems to says the opposite of what you claim here. – Lundin Feb 03 '17 at 08:54
  • AFAIK, prototypes were already included in the second edition of the K&R book. – EOF Feb 03 '17 at 12:28
  • @EOF: Yes, 2nd Edition of K&R includes prototypes and was based originally on the draft standard (and largely checked using C++ compilers). There was quite a long pause between most of the standard being ready and the final version. There were interesting politics involved, related to internationalization (aka I18N), and functions like `localeconv()` were part of that (trigraphs too, IIRC). So, the majority of standard C was known for a year or two (or more) before the standard was published. – Jonathan Leffler Feb 03 '17 at 15:36
  • @Lundin The rationale for ISO C99 gives some reasons in 6.5.2.2, though `strchr` is not mentioned there because it doesn't differ from all the other affected functions. – Roland Illig Feb 03 '17 at 22:44
  • @RolandIllig Actually 6.5.2.2 doesn't say much but I now noticed that it points to 7.1.4: "All library prototypes are specified in terms of the “widened” types: an argument formerly declared as char is now written as int. This ensures that most library functions can be called with or without a prototype in scope, thus maintaining backwards compatibility with pre-C89 code." – Lundin Feb 06 '17 at 07:46
  • However, there are still plenty of separate rationales stating the EOF case too. – Lundin Feb 06 '17 at 07:48
5

Consider the return value of fgetc(), values in the range of unsigned char and EOF, some negative value. This is the kind of value to pass to strchr().

@Roland Illig presents a very good explanation of the history that led to retaining use of int ch with strchr().


OP's code fails/has trouble as follows.

1) char* str is treated like unsigned char *str per §7.23.1.1 3

For all functions in this subclause, each character shall be interpreted as if it had the type unsigned char

2) i should be type size_t, to handle the entire range of the character array.

3) For the purpose of strchr(), the null character is considered part of the search.

The terminating null character is considered to be part of the string.

4) Better to use const as str is not changed.

char* findchar(const char* str, int c)     {
    const char* position = NULL;
    size_t i = 0;
    for(i = 0; ;i++) {
        if((unsigned char) str[i] == c) {
            position = &str[i];
            break;
        }
        if (str[i]=='\0') break;
    }
    return (char *) position;
}

Further detail

The strchr function locates the first occurrence of c (converted to a char) in the string pointed to by s. C11dr §7.23.5.2 2

So int c is treat like a char. This could imply

        if((unsigned char) str[i] == (char) c) {

Yet what I think this is meant:

        if((unsigned char) str[i] == (unsigned char)(char) c) {

or simply

        if((unsigned char) str[i] == (unsigned char)c) {
Community
  • 1
  • 1
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • 1
    @user7427260 In the end makes little difference. I used `const char* position` to match `const char* str ...position = &str[i];` and `return (char *) position;` to match the return type of `char *strchr(const char *s, int c);`. It is a compromise of `strchr()` accepting `char *` and `const char *`. rectified in C++. – chux - Reinstate Monica Feb 03 '17 at 08:33
  • thank you very much! so i need to cast to make sure that the test condition is done correctly.. thanks again! – 360NS Feb 03 '17 at 08:34
  • It would generally be simpler (and equally or more efficient) not to have the `i` variable at all, but instead just to rewrite the loop to increment the `position` pointer directly, as in: `for (position = str; *position != '\0'; position++) { if (*position == c) return position; }`. Of course, depending on the type of `i` and other details, a good optimizing compiler *might* be able to recognize that these functions do effectively the same thing, and compile them into the same optimized assembly code. – Ilmari Karonen Feb 03 '17 at 11:59
  • @IlmariKaronen Agreed. A re-write would likely result in better flowing code. Yet that much might lose instructive connection to OP's original code. Unfortunately your suggestion of `for (position = str; *position != '\0'; position++) ...` returns `NULL` when `ch == 0`. Code should return the end of the string address: functionality before performance. – chux - Reinstate Monica Feb 03 '17 at 15:46
  • @chux Good point about the `c == '\0'` case, although one way to handle that would be to add the line `if (c == '\0') return position;` after the loop (and before the final `return NULL;`). – Ilmari Karonen Feb 03 '17 at 21:49
  • @IlmariKaronen This answer's code is not very tight. An example tight solution would be `char* strchr(const char* str, int c) { do { if ((unsigned char) *str == (unsigned char) c) { return (char *) str; } } while (*str); return NULL; }` – chux - Reinstate Monica Feb 03 '17 at 22:05
  • @chux I forgot to ask you about this: `char* str` is treated like `unsigned char *str`? Why is that? I'm just sticking to ASCII so plain char should be fine? – 360NS Feb 03 '17 at 22:05
  • @user7427260 "I'm just sticking to ASCII" --> does that include everyone who might use or refer to your fine code from now on? If so. yes - no problem. Yet why write code with that limitation when portability is just a small step away? – chux - Reinstate Monica Feb 03 '17 at 22:08
  • @chux so if i have understood correctly it's done for compatibility with eight-bit character encodings like Extended ASCII? – 360NS Feb 03 '17 at 22:19