What are the differences between strtok and strsep in C

Question

Could someone explain me what differences there are between strtok() and strsep()? What are the advantages and disadvantages of them? And why would I pick one over the other one.

Jonathan Leffler · Answer 1 · 2017-09-30T21:17:15.753

64

One major difference between strtok() and strsep() is that strtok() is standardized (by the C standard, and hence also by POSIX) but strsep() is not standardized (by C or POSIX; it is available in the GNU C Library, and originated on BSD). Thus, portable code is more likely to use strtok() than strsep().

Another difference is that calls to the strsep() function on different strings can be interleaved, whereas you cannot do that with strtok() (though you can with strtok_r()). So, using strsep() in a library doesn't break other code accidentally, whereas using strtok() in a library function must be documented because other code using strtok() at the same time cannot call the library function.

The manual page for strsep() at kernel.org says:

The strsep() function was introduced as a replacement for strtok(3), since the latter cannot handle empty fields.

Thus, the other major difference is the one highlighted by George Gaál in his answer; strtok() permits multiple delimiters between a single token, whereas strsep() expects a single delimiter between tokens, and interprets adjacent delimiters as an empty token.

Both strsep() and strtok() modify their input strings and neither lets you identify which delimiter character marked the end of the token (because both write a NUL '\0' over the separator after the end of the token).

When to use them?

You would use strsep() when you want empty tokens rather than allowing multiple delimiters between tokens, and when you don't mind about portability.
You would use strtok_r() when you want to allow multiple delimiters between tokens and you don't want empty tokens (and POSIX is sufficiently portable for you).
You would only use strtok() when someone threatens your life if you don't do so. And you'd only use it for long enough to get you out of the life-threatening situation; you would then abandon all use of it once more. It is poisonous; do not use it. It would be better to write your own strtok_r() or strsep() than to use strtok().

Why is `strtok()` poisonous?

The strtok() function is poisonous if used in a library function. If your library function uses strtok(), it must be documented clearly.

That's because:

If any calling function is using strtok() and calls your function that also uses strtok(), you break the calling function.
If your function calls any function that calls strtok(), that will break your function's use of strtok().
If your program is multithreaded, at most one thread can be using strtok() at any given time — across a sequence of strtok() calls.

The root of this problem is the saved state between calls that allows strtok() to continue where it left off. There is no sensible way to fix the problem other than "do not use strtok()".

You can use strsep() if it is available.
You can use POSIX's strtok_r() if it is available.
You can use Microsoft's strtok_s() if it is available.
Nominally, you could use the ISO/IEC 9899:2011 Annex K.3.7.3.1 function strtok_s(), but its interface is different from both strtok_r() and Microsoft's strtok_s().

BSD strsep():

char *strsep(char **stringp, const char *delim);

POSIX strtok_r():

char *strtok_r(char *restrict s, const char *restrict sep, char **restrict state);

Microsoft strtok_s():

char *strtok_s(char *strToken, const char *strDelimit, char **context);

Annex K strtok_s():

char *strtok_s(char * restrict s1, rsize_t * restrict s1max,
               const char * restrict s2, char ** restrict ptr);

Note that this has 4 arguments, not 3 as in the other two variants on strtok().

edited Sep 30 '17 at 21:17

answered Aug 28 '11 at 07:01

Jonathan Leffler

730,956
141
904
1,278

2

Note that the Annex K `strtok_s()` is declared as: `char *strtok_s(char * restrict s1, rsize_t * restrict s1max, const char * restrict s2, char ** restrict ptr);` which doesn't match the interface of Microsoft's `strtok_s()` or POSIX's `strtok_r()`. Even if it was implemented, the difference is annoying — it limits the usefulness of the Annex K function. See also [Do you use the TR 24731 'safe' functions?](https://stackoverflow.com/questions/372980/do-you-use-the-tr-24731-safe-functions) – Jonathan Leffler Sep 30 '17 at 20:52
I got very little actual information from your highly rated explanation. I say this only because it seems like you believe your explanation may be rather exhaustive and may make the whole matter quite clear and it probably does to high intelligence people, but it was almost entirely opaque to me. It doesn't really answer the question for me at all. I had to look up the term interleave (alternate layers). So no multithreading I guess. "multiple delimiters between a single token" By 'token' you mean a substring? But every call creates a substring when it writes a '\0'. I am totally confused. – iamoumuamua May 27 '20 at 04:41
I'm sorry that you got no information out of my answer, @iamoumuamua. The specification of [`strtok()`](https://port70.net/~nsz/c/c11/n1570.html#7.24.5.8) says: _A sequence of calls to the `strtok` function breaks the string pointed to by `s1` into a sequence of tokens, each of which is delimited by a character from the string pointed to by `s2`._ So, tokens are what `strtok()` identifies. _[…continued 1…]_ – Jonathan Leffler May 27 '20 at 04:49
_[…continuation 1…]_ The statement "calls to the `strsep()` function on different strings can be interleaved, whereas you cannot do that with `strtok()`" means that you can use `strsep()` to slice and dice two different strings in parallel, taking first a token from `string1` then a token from `string2`, whereas `strtok()` requires you to completely split `string1` before you tackle `string2` or vice versa. Yes, that means no multi-threading with `strtok()`, but it also severely constrains single-threaded programs too. _[…continued 2…]_ – Jonathan Leffler May 27 '20 at 04:52
_[…continuation 2…]_ The tokens found by `strtok()` and `strsep()` are separated by delimiters. With `strtok()`, multiple adjacent delimiters are treated as part of a single gap between tokens (so you cannot have empty tokens with `strtok()`), whereas `strsep()` assumes each token is separated from the next by a single delimiter, and two adjacent delimiters mean there is an empty token between them. – Jonathan Leffler May 27 '20 at 04:54
Those are exactly the sort of sentences I have difficulty parsing. I think I just need to see some concrete examples or a 'for dummies' explanation. Actually I think I understand the multiple delimiter thing now. Thanks. I use strtok() for transforming a block of memory (buffer) after reading a text file into what I think of as a multistring which is a block of memory with contiguous null terminated strings. For me the problem is I cannot get the address of the last substring if there is no delimiter like " " at the end. Trying to figure out if strsep() or strtok_r() may be better but... – iamoumuamua May 27 '20 at 05:02
@iamoumuamua — `strtok()` identifies tokens which are separated by delimiters. Any character that is not a delimiter is part of a token. When called, `strtok()` skips over any leading delimiters, then identifies the first non-delimiter and records its position, which it will return. It skips over one or more non-delimiters which make up the token. When it encounters a delimiter, or reaches the end of the string (a null byte, `'\0'`), it ensures that the token is null terminated. _[…continued…]_ – Jonathan Leffler May 27 '20 at 05:09
_[…continued…]_ It records where it got to in a private variable (which is what causes most of the problems with `strtok()`), and returns the pointer to the start of the token it just found. When it is next called with a `NULL` pointer as the first argument, it retrieves where it got to from the private variable, and resumes its scanning, skipping over any delimiters before finding a non-delimiter, and so on. The sets of delimiter characters can be different for different calls to `strtok()`, even when operating on a single string. _[…continued again…]_ – Jonathan Leffler May 27 '20 at 05:13
_[…next continuation…]_ The `strtok_r()`, `strtok_s()` and `strsep()` functions all avoid the private variable; the user passes storage space to the function for that information. This means that they are thread-safe and reentrant and can be used to analyze different strings in parallel. – Jonathan Leffler May 27 '20 at 05:15
@iamoumuamua — you said "I cannot get the address of the last substring if there is no delimiter … at the end". You get the last token by repeatedly calling `strtok()`, first with a pointer to the start of the string to be analyzed, and thereafter with `NULL` (as the first argument). You get a new token on each call until there are no tokens left, whereupon `strtok()` returns a `NULL` pointer, indicating that there are no more tokens left. You won't know you've read the last token until you try to read the (non-existent) token after the last one. – Jonathan Leffler May 27 '20 at 05:18
That clarified things a lot for me. As I said I use strtok() to sort of divide up a text file buffer into lines and then into words per line and store the pointers to these substrings in arrays. The array of lines is no problem because I can put a '\n' at the end, but for the words in a line if the last word does not have a space at the end I don't think strtok(NULL," ") returns the address of that last word. I am thinking I need one last call to strtok(NULL,"\n") maybe. – iamoumuamua May 27 '20 at 05:20
@iamoumuamua — You may find [Nested `strtok` function problem in C](https://stackoverflow.com/q/4693884/15168) or [Using `strtok()` in a loop](https://stackoverflow.com/q/1509654/15168) informative. Your description of splitting a buffer into lines at newlines `'\n'` and then into words (probably based on white space) is very like the scenarios in those questions. The answers make it clear that `strtok()` is not a good tool to use for such problems — but that `strtok_r()`, `strtok_s()` and `strsep()` are all reasonable tools to use (though `strsep()` has different semantics from the others). – Jonathan Leffler May 27 '20 at 05:26

score 10 · Accepted Answer · edited Dec 06 '14 at 23:35

10

From The GNU C Library manual - Finding Tokens in a String:

One difference between strsep and strtok_r is that if the input string contains more than one character from delimiter in a row strsep returns an empty string for each pair of characters from delimiter. This means that a program normally should test for strsep returning an empty string before processing it.

edited Dec 06 '14 at 23:35

Cristian Ciupitu

20,270
7
50
76

answered Aug 28 '11 at 02:14

George Gaál

1,216
10
21

can u give me an example please I am a bit confused – mizuki Aug 28 '11 at 02:18
2

You can find examples of using these functions if you click on [link](http://www.gnu.org/s/hello/manual/libc/Finding-Tokens-in-a-String.html) :-) Also please note that `strsep` function may be absent in your C compiler. – George Gaál Aug 28 '11 at 02:20

H.S. · Answer 3 · 2020-04-09T04:17:50.150

First difference in strtok() and strsep() is the way they handle contiguous delimiter characters in the input string.

Contiguous delimiter characters handling by strtok():

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {
    const char* teststr = "aaa-bbb --ccc-ddd"; //Contiguous delimiters between bbb and ccc sub-string
    const char* delims = " -";  // delimiters - space and hyphen character
    char* token;
    char* ptr = strdup(teststr);

    if (ptr == NULL) {
        fprintf(stderr, "strdup failed");
        exit(EXIT_FAILURE);
    }

    printf ("Original String: %s\n", ptr);

    token = strtok (ptr, delims);
    while (token != NULL) {
        printf("%s\n", token);
        token = strtok (NULL, delims);
    }

    printf ("Original String: %s\n", ptr);
    free (ptr);
    return 0;
}

Output:

# ./example1_strtok
Original String: aaa-bbb --ccc-ddd
aaa
bbb
ccc
ddd
Original String: aaa

In the output, you can see the token "bbb" and "ccc" one after another. strtok() does not indicate the occurrence of contiguous delimiter characters. Also, the strtok() modify the input string.

Contiguous delimiter characters handling by strsep():

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {
    const char* teststr = "aaa-bbb --ccc-ddd"; //Contiguous delimiters between bbb and ccc sub-string
    const char* delims = " -";  // delimiters - space and hyphen character
    char* token;
    char* ptr1;
    char* ptr = strdup(teststr);

    if (ptr == NULL) {
        fprintf(stderr, "strdup failed");
        exit(EXIT_FAILURE);
    }

    ptr1 = ptr;

    printf ("Original String: %s\n", ptr);
    while ((token = strsep(&ptr1, delims)) != NULL) {
        if (*token == '\0') {
            token = "<empty>";
        }
        printf("%s\n", token);
    }

    if (ptr1 == NULL) // This is just to show that the strsep() modifies the pointer passed to it
        printf ("ptr1 is NULL\n");
    printf ("Original String: %s\n", ptr);
    free (ptr);
    return 0;
}

Output:

# ./example1_strsep
Original String: aaa-bbb --ccc-ddd
aaa
bbb
<empty>             <==============
<empty>             <==============
ccc
ddd
ptr1 is NULL
Original String: aaa

In the output, you can see the two empty string (indicated through <empty>) between bbb and ccc. Those two empty strings are for "--" between "bbb" and "ccc". When strsep() found a delimiter character ' ' after "bbb", it replaced delimiter character with '\0' character and returned "bbb". After this, strsep() found another delimiter character '-'. Then it replaced delimiter character with '\0' character and returned the empty string. Same is for the next delimiter character.

Contiguous delimiter characters are indicated when strsep() returns a pointer to a null character (that is, a character with the value '\0').

The strsep() modify the input string as well as the pointer whose address passed as first argument to strsep().

Second difference is, strtok() relies on a static variable to keep track of the current parse location within a string. This implementation requires to completely parse one string before beginning a second string. But this is not the case with strsep().

Calling strtok() when another strtok() is not finished:

#include <stdio.h>
#include <string.h>

void another_function_callng_strtok(void)
{
    char str[] ="ttt -vvvv";
    char* delims = " -";
    char* token;

    printf ("Original String: %s\n", str);
    token = strtok (str, delims);
    while (token != NULL) {
        printf ("%s\n", token);
        token = strtok (NULL, delims);
    }
    printf ("another_function_callng_strtok: I am done.\n");
}

void function_callng_strtok ()
{
    char str[] ="aaa --bbb-ccc";
    char* delims = " -";
    char* token;

    printf ("Original String: %s\n", str);
    token = strtok (str, delims);
    while (token != NULL)
    {
        printf ("%s\n",token);
        another_function_callng_strtok();
        token = strtok (NULL, delims);
    }
}

int main(void) {
    function_callng_strtok();
    return 0;
}

Output:

# ./example2_strtok
Original String: aaa --bbb-ccc
aaa
Original String: ttt -vvvv
ttt
vvvv
another_function_callng_strtok: I am done.

The function function_callng_strtok() only print token "aaa" and does not print the rest of the tokens of input string because it calls another_function_callng_strtok() which in turn call strtok() and it set the static pointer of strtok() to NULL when it finishes with extracting all the tokens. The control comes back to function_callng_strtok() while loop, strtok() returns NULL due to the static pointer pointing to NULL and which make the loop condition false and loop exits.

Calling strsep() when another strsep() is not finished:

#include <stdio.h>
#include <string.h>

void another_function_callng_strsep(void)
{
    char str[] ="ttt -vvvv";
    const char* delims = " -";
    char* token;
    char* ptr = str;

    printf ("Original String: %s\n", str);
    while ((token = strsep(&ptr, delims)) != NULL) {
        if (*token == '\0') {
            token = "<empty>";
        }
        printf("%s\n", token);
    }
    printf ("another_function_callng_strsep: I am done.\n");
}

void function_callng_strsep ()
{
    char str[] ="aaa --bbb-ccc";
    const char* delims = " -";
    char* token;
    char* ptr = str;

    printf ("Original String: %s\n", str);
    while ((token = strsep(&ptr, delims)) != NULL) {
        if (*token == '\0') {
            token = "<empty>";
        }
        printf("%s\n", token);
        another_function_callng_strsep();
    }
}

int main(void) {
    function_callng_strsep();
    return 0;
}

Output:

# ./example2_strsep
Original String: aaa --bbb-ccc
aaa
Original String: ttt -vvvv
ttt
<empty>
vvvv
another_function_callng_strsep: I am done.
<empty>
Original String: ttt -vvvv
ttt
<empty>
vvvv
another_function_callng_strsep: I am done.
<empty>
Original String: ttt -vvvv
ttt
<empty>
vvvv
another_function_callng_strsep: I am done.
bbb
Original String: ttt -vvvv
ttt
<empty>
vvvv
another_function_callng_strsep: I am done.
ccc
Original String: ttt -vvvv
ttt
<empty>
vvvv
another_function_callng_strsep: I am done.

Here you can see, calling strsep() before completely parse one string doesn't makes any difference.

So, the disadvantage of strtok() and strsep() is that both modify the input string but strsep() has couple of advantages over strtok() as illustrated above.

From strsep:

The strsep() function is intended as a replacement for the strtok() function. While the strtok() function should be preferred for portability reasons (it conforms to ISO/IEC 9899:1990 (``ISO C90'')) it is unable to handle empty fields, i.e., detect fields delimited by two adjacent delimiter characters, or to be used for more than a single string at a time. The strsep() function first appeared in 4.4BSD.

For reference:

Hi. Reference links seems invalid. can you change to https://www.gnu.org/software/libc/manual/html_node/Finding-Tokens-in-a-String.html or other valid link.. — jian, Oct 24 '22 at 15:03

What are the differences between strtok and strsep in C

3 Answers3

When to use them?

Why is `strtok()` poisonous?

Linked

Related

What are the differences between strtok and strsep in C

3 Answers3

When to use them?

Why is strtok() poisonous?

Linked

Related

Why is `strtok()` poisonous?