3

I have inherited a large code base and there is a utility function to split strings on : char. I understand about 80% of how it works, I do not understand the *token = '\0'; line.

Any pointers are highly appreciated.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_TOKEN_SIZE 200

const char *splitter(const char *str, char delimiter, char *token) {

    while (*str && (delimiter != *str)) {
        *token++ = *str;
        str++;            
    }
    if (delimiter == *str)
        str++;

    *token = '\0';    // what is this line doing?

    //how could the token be correct in the main() after setting it to null terminator 
    //here?

    return str;
} 

int main() {
    char token[MAX_TOKEN_SIZE + 1];  
    const char *env = "/bin:/sbin:::/usr/bin";
    while (*env) {
        env = splitter(env, ':', token);  

        //if token is empty, set it to "./"
        if ((token != NULL) && (token[0] == '\0')) {
            strcpy(token, "./\0");            
        }

        printf("%s\n", token)  ;
    }
    return 0;
}

The output is correct:

/bin
/sbin
./
./
/usr/bin
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
Yoshiro
  • 53
  • 1
  • 3
  • 1
    You're copying `char`acters from `str` to `token` and `\0` (NUL) terminating it so you can use `` functions and `printf`. – Fiddling Bits Sep 09 '22 at 20:26
  • @Yoshiro This if statement if ((token != NULL) && (token[0] == '\0')) { has a redundant expression. It may be written like if ( token[0] == '\0') { – Vlad from Moscow Sep 09 '22 at 20:32
  • Thanks Vlad from Moscow. I will take your suggestion and modify the code. I will appreciate if you could point out any other inefficiencies in the code. Thank you. – Yoshiro Sep 09 '22 at 20:47
  • Note that if `token` is actually a `char *` instead of a `char []`, the `NULL` check makes sense again. – ndim Sep 09 '22 at 22:06

3 Answers3

1

For starters I will point to a redundant code.

This if statement

if ((token != NULL) && (token[0] == '\0')) {

has a senseless expression because token never can be equal to NULL. token in main is declared as a character array. So you could write

if ( token[0] == '\0') {

Also in the string literal in this statement

strcpy(token, "./\0");

the explicit terminating zero character '\0' is redundant. You can just write

strcpy(token, "./");

As for your question.

The function splitter extracts a sequence of characters until the character delimiter is encountered and stores it in the array token,

while (*str && (delimiter != *str)){
     *token++ = *str;
     str++;            
}

But the result sequence does not represent a string. It shall be ended with the terminating zero character \0 and this statement

*token = '\0'; 

appends the terminating zero character to the end of the extracted sequence stored in the array token.

As for this statement

if (delimiter == *str)
    str++;

then if it is not the end of the string str (that is if the current character *str is not the terminating zero character '\0'; if it is equal to delimiter then it is not the terminating zero character) then the pointer str is incremented and returned from the function to allow the caller in the next call of the function continue to process the string from the next positions.

So initially you have

 const char *env = "/bin:/sbin:::/usr/bin";

the function copies character /bin appended with the zero character '\0' that is the string "/bin" to the array token. After this call the returned pointer from the function will point to the substring

"/sbin:::/usr/bin"

because the preceding character ':' was skipped by this statement

if (delimiter == *str)
    str++;

within the function.

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
0

As you have stated, the line in question is setting the char pointed to by token to the nul terminator (as is required by virtually all string-handling functions in C).

But, assuming there are other characters in the extracted token, those will already have been added, sequentially, to the target array by the earlier *token++ = *str; line. Note that this copies a character to the pointed-to element of the array and then increments the pointer (so that it then points to the next char in the string/array).

So, when the while loop has finished, token will be pointing to the element of the array that immediately follows the last character copied in that loop – which is exactly where there needs to be a nul terminator.

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
  • Thank you Adrian Mole. Now I understand, we are relying on the postfix increment operator to arrive at the correct position for null terminator. I have another question if you do not mind : how is token in the main() getting reset for the next iteration of while loop? I do not see anything setting it to null after we get the token from splitter. Thank you again. – Yoshiro Sep 09 '22 at 20:45
  • @Yoshiro In your `main` function, the call to `splitter` uses the *same* `token` array each time through the loop. That will always 'decay' to a pointer to the first element of the array, so the function will always start there (and overwrite any characters put there by a previous call). – Adrian Mole Sep 09 '22 at 20:52
  • `token` is not *reset* in the `main` function, the next call to `splitter()` copies the next token into the array and sets a null byte at the end. If no more tokens are present, the destination array will contain an empty string. – chqrlie Sep 09 '22 at 20:52
  • Thank you all of you, I have learned so much. Wish I could Accept all of your comments. Thanks again. – Yoshiro Sep 10 '22 at 13:00
0

There are subtle problems in the posted code:

  • the test if ((token != NULL) && (token[0] == '\0')) is redundant: token is an array, hence token!= NULL is always true.

  • splitter does not receive the length of the destination array: if the str argument contains a token longer than MAX_TOKEN_SIZE bytes, it will cause undefined behavior because of a buffer overflow.

  • if the delimiter passed to splitter is the null byte, the return value will point beyond the end of the string, potentially causing undefined behavior.

  • the line *token = '\0'; just sets the null terminator at the end of the token copied from str, if any.

Here is a modified version:

#include <stdio.h>
#include <string.h>

#define MAX_TOKEN_SIZE 200

const char *splitter(const char *str, char delimiter, char *token, size_t size) {
    size_t i = 0;
    while (*str) {
        char c = *str++;
        if (c == delimiter)
            break;
        if (i + 1 < size)
            token[i++] = c;
    }
    if (i < size) {
        token[i] = '\0';  /* set the null terminator */
    }
    return str;
} 

int main() {
    char token[MAX_TOKEN_SIZE + 1];  
    const char *env = "/bin:/sbin:::/usr/bin";
    while (*env) {
        env = splitter(env, ':', token, MAX_TOKEN_SIZE + 1);  

        // if token is empty, set it to "./"
        if (*token == '\0') {
            strcpy(token, "./");            
        }
        printf("%s\n", token);
    }
    return 0;
}
chqrlie
  • 131,814
  • 10
  • 121
  • 189