strtok affects the input buffer

Question

I am using strtok to tokenise the string, Is strtok affects the original buffer? For e.g:

   *char buf[] = "This Is Start Of life";
    char *pch = strtok(buf," "); 
    while(pch) 
    {
        printf("%s \n", pch);
        pch = strtok(NULL," "); 
    }*
    printf("Orignal Buffer:: %s ",buf);

Output is::
         This
         Is
         Start
         Of
         life
         Original Buffer:: This

I read that strtok returns pointer to the next token, then how the buf is getting affected? Is there way to retain original buffer (without extra copy overhead)?

Follow-on Question:: from so far answers I guess there is no way to retain the buffer. So what if I use dynamic array to create original buffer and if strtok is going to affect it, then there will be memory leak while freeing the original buffer or is strtok takes care of freeing memory?

As long as you retain the pointer returned by `malloc()` (or `calloc()` or `realloc()` — or whatever routine was used to allocate the memory), you can free the pointer with `free()`. The allocation routines don't care about the data in the memory that was allocated; all they require is that you pass a pointer to a chunk of memory that they previously allocated. — Jonathan Leffler, Apr 21 '14 at 07:20
As an unrelated but important note, you may want to consider using a re-entrant form of strtok() that is thread-safe like strtok_r in linux or strtok_s on Windows. In addition, I usually only use strtok when I care about avoiding copying the buffer. In other cases I might turn to a more C++-esque tokenization technique e.g. http://stackoverflow.com/questions/53849/how-do-i-tokenize-a-string-in-c — aselle, Apr 21 '14 at 07:48

score 4 · Accepted Answer · edited Apr 21 '14 at 07:13

4

strtok() doesn't create a new string and return it; it returns a pointer to the token within the string you pass as argument to strtok(). Therefore the original string gets affected.

strtok() breaks the string means it replaces the delimiter character with NULL and returns a pointer to the beginning of that token. Therefore after you run strtok() the delim characters will be replaced by NULL characters. You can read link1 link2.

As you can see in output of example in link2, the output you are getting is as expected since the delim character is replaced by strtok.

edited Apr 21 '14 at 07:13

Jonathan Leffler

730,956
141
904
1,278

answered Apr 21 '14 at 06:54

LearningC

3,182
1
12
19

So there is no excuse to copy if I want to retain the original buffer? also if I use dynamic array to create original buffer and strtok is going to affect it, then there will be memory leak or is strtok takes care of freeing memory? – Meluha Apr 21 '14 at 07:08
1

If you want to retain the original buffer undamaged, you'll have to make a copy and have `strtok()` parse it. Amongst other things, that would allow you to find out which of several different delimiters were zapped by `strtok()`, which is otherwise impossible to find out after the event. `strtok()` does no memory allocation or deallocation; it frees nothing because it allocates nothing. – Jonathan Leffler Apr 21 '14 at 07:15

score 3 · Answer 2 · edited Apr 21 '14 at 07:17

When you do strtok(NULL, "|"), strtok finds a token and puts null on place (replace delimiter with '\0') and modifies the string. So you need to make the copy of the original string before tokenization.

Please try following:

void main(void)
{
    char buf[] = "This Is Start Of life"; 
    char *buf1;

    /* calloc() function will allocate the memory & initialize its to the NULL*/

    buf1 = calloc(strlen(buf)+1, sizeof(char)); 

    strcpy(buf1, buf);

    char *pch = strtok(buf," "); 
    while(pch) 
    {
        printf("%s \n", pch);
        pch = strtok(NULL," "); 
    }
    printf("Original Buffer:: %s ",buf1);  
}

Note that the effort `calloc()` puts into zeroing the memory is wasted by the subsequent `strcpy()`. — Jonathan Leffler, Apr 21 '14 at 07:18

strtok affects the input buffer

2 Answers2

Linked