4

I'm currently studying C, if this question seems an easy one or a newbie ones, then you know why.

So, I know there are a lot of ways to remove the '\n' left by fgets(), as already discussed on SO here, here and here.

I'm going to focus this topic about these three methods :

  • char *strchr(const char *s, int c);

    if (fgets(sentence, 11, stdin) != NULL) {
        p = strchr(sentence, '\n');
        *p = '\0';
    }
    
  • char *strtok(char *str, const char *delim);

    if (fgets(sentence, 11, stdin) != NULL)
        token = strtok(sentence, "\n");
    
  • size_t strcspn(const char *s, const char *reject);

    if (fgets(sentence, 11, stdin) != NULL)
        sentence[strcspn(sentence, "\n")] = '\0';
    

Assume that the variables p and token are declared as char *p = NULL, *token = NULL;

They do their job but, as far as performance are concerned, do they differ?

Once, surfing on the web (I'm sorry that I haven't proof of this, because I forgot the link) I found that strspn is not a real good way to do that if one is interested in performance, hence my question.

Before posting this on SO I've searched here without finding what I want to know. I've also tried profiling that myself, both using time ./executable and this method found on SO. However I had no luck, because the results were inconsistent.

Can someone help me finding out if I profiled wrong or if they are really equal?

EDIT : Here is the link where I found that strcspn is not efficient.

Claudio Cortese
  • 1,372
  • 2
  • 10
  • 21
  • 2
    I think this comment in your last link is much more useful: "..the whole operation is I/O bound because of fgets. You can save cycles when stripping the newline, but it isn't likely to have a noticeable effect on performance as a whole". That is, you might be better off focusing your efforts elsewhere. – kaylum Jan 24 '16 at 22:23
  • @kaylum I wanted to be sure, I've feeling that here on SO you can be sure of what people tells you. – Claudio Cortese Jan 24 '16 at 22:30

2 Answers2

4

Before we discuss performance, it is important to verify correctness.

Let's look at your methods, and some other popular ones:

strchr

if (fgets(sentence, 11, stdin) != NULL) {
    p = strchr(sentence, '\n');
    *p = '\0';
}

You forgot to test if p is not NULL. This is a big problem because it is possible that sentence does not contain a \n, either because the line read was loo long and only part of it is in sentence or for the last line in the file if it is not terminated by a \n, or if the file contains a null byte. You should write this version this way:

if (fgets(sentence, 11, stdin) != NULL) {
    char *p = strchr(sentence, '\n');
    if (p != NULL)
        *p = '\0';
    ...
}

strchrnul

Some C libraries have a non standard function strchrnul with this prototype:

char *strchrnul(const char *s, int c);

It returns a pointer to the first occurrence of c in the string s or a pointer to the final \0 if no occurrence can be found. This functions allows for a very simple and efficient way to strip the \n:

if (fgets(sentence, 11, stdin) != NULL) {
    *strchrnul(sentence, '\n') = '\0';
    ...
}

The only drawback is that this function is not part of the C Standard and may not be available on some platforms.

strtok

if (fgets(sentence, 11, stdin) != NULL) {
    token = strtok(sentence, "\n");
    ...
}

This version is incorrect: strtok has a side effect on its internal data. This version will interfere with surrounding code that uses strtok. If you bury this method in a function, you will hide this side effect and may cause hard to find bugs for programmers using you function. You could possibly use the reentrant version of strtok: strtok_r, but it is not always available.

Furthermore, as user3121023 commented, strtok will not remove the \n if it is at the beginning of the string. This definitely disqualifies this method. (strtok has too many quirks, it should probably be avoided altogether anyway.)

strlen

You did not mention the strlen alternative. I see it quite often written this way:

if (fgets(sentence, 11, stdin) != NULL) {
    sentence[strlen(sentence) - 1] = '\0';
    ...
}

This is incorrect for multiple reasons:

  • sentence might not have a \n as its last character, as already explained for the strchr version. Attempting to remove the \n this way would remove a valid character.

  • sentence might be an empty string, in which case the code would have undefined behavior. For sentence to be empty requires exceptional conditions not specified in the C Standard: if the input stream contains a NUL byte at the beginning of a line, fgets() might return an empty buffer.

For correctness, this method should be implemented this way:

if (fgets(sentence, 11, stdin) != NULL) {
    size_t len = strlen(sentence);
    if (len > 0 && sentence[len - 1] == '\n')
        sentence[--len] = '\0';
    // useful side effect: len has been updated.
    ...
}

strcspn

if (fgets(sentence, 11, stdin) != NULL) {
    sentence[strcspn(sentence, "\n")] = '\0';
    ...
}

This is the simplest version. It works whether sentence contains a \n or not and even for an empty string. It is less likely to be misused by a programmer.

Performance

Whether strcspn is more or less efficient than the other ones depends a lot on the C library implementation and the compiler performance. The performance should be better than that of strtok, since it does only one scan. It is likely less efficient than strchr and even less than strlen, but for correctness, the strlen alternative should also use 2 extra tests for len > 0 and sentence[len - 1] == '\n', reducing performance.

Note that some libraries use compile-time tests that may allow special casing a 1 byte string literal argument. In this case, the compiler can generate inline code that would be even more efficient than strchr.

It is true that strcspn may be implemented in a rather naive way in some C libraries, but this is not the case for the GNU libc, nor for the Apple C Library. Their implementations are quite efficient, and at least gcc uses builtin knowledge of this and other string functions to generate better code.

On environments with an optimized implementation of strchrnul, this method should be hard to beat.

As always, run benchmarks and profile the different alternatives with different compilers, processors, C libraries... different platforms may yield very different results.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • For `fgets()` to return an empty string, the first byte on a line after a newline (or the first byte in the file) needs to be a null byte. That's a problem with `fgets()` — there isn't a way to know how many characters it thinks it read, and if one of the characters it read was a null byte, there's no way to know what else it did read. From that point of view, the POSIX [`getline()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/getline.html) function scores strongly — it tells you how many bytes it read. – Jonathan Leffler Jan 25 '16 at 03:29
  • @JonathanLeffler: yes, I realized that. It is really a border case. I suspect this to be untested and different C libraries might behave differently to this respect. – chqrlie Jan 25 '16 at 08:42
2

This method as it is written

if (fgets(sentence, 11, stdin) != NULL) {
    p = strchr(sentence,'\n');
    p = '\0';
    //^^ must be *p 
}

is incorrect because the new line character can be absent in the string. In this case pointer p will be equal to NULL and the code snippet will have undefined behaviour.

You need to change it like

    if ( p ) *p = '\0';

or

    if ( ( p = strchr(sentence,'\n') ) != NULL ) *p = '\0';

It is efficient enough because only one character is searched.

However its drawback is that you need an additional variable that will point to the new line character.

This method

if (fgets(sentence, 11, stdin) != NULL)
    token = strtok(sentence, "\n");

semantically is not very suitable. Usually the function strtok is used in other context when you need to split a string into tokens. And the function returns a null pointer in case if the string contains only the new line character.

So the most appropriate method is this

if (fgets(sentence, 11, stdin) != NULL)
    sentence[strcspn(sentence, "\n")] = 0;

because it is safe and no additional variable is required.

As for me then in C++, I would use

if ( char *p = strchr( sentence, '\n' ) ) *p = '\0';

and in C, I would use :)

sentence[strcspn(sentence, "\n")] = '\0';
chqrlie
  • 131,814
  • 10
  • 121
  • 189
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
  • @ClaudioCortese Usually it is used to split a string into tokens. You do not want to split a string. You want to remove the new line character. – Vlad from Moscow Jan 24 '16 at 22:42
  • You are right, but it does work, as you "tokenize" the entire string taken with fgets removing the '\n' from it. – Claudio Cortese Jan 24 '16 at 22:44
  • 1
    @ClaudioCortese Yes it works but it always will take one or two minutes of the reader of the code.:) – Vlad from Moscow Jan 24 '16 at 22:46
  • In C++, I'd use `std::getline()` or `std::istream`s `getline()` method which both discard the newline if they find it. Only a C programmer would use `fgets()` in C++, and need to worry about discarding the newline. – Peter Jan 24 '16 at 23:34
  • @ClaudioCortese: `strtok()` is inappropriate because it changes the state of a global hidden variable. As such, `strtok()` must be used with precautions: Using `strtok()` to strip the newline may break other code surrounding the call to `fgets()` that uses `strtok()` for another purpose and relies of the internal state being preserved between calls. `sentence[strcspn(sentence, "\n")] = '\0';` does not have such side-effects and is preferred. Furthermore, `strtok()` will not remove the `'\n` if it occurs at the beginning of the array... – chqrlie Jul 01 '17 at 11:42