0

I am using strtok to tokenise the string, Is strtok affects the original buffer? For e.g:

   *char buf[] = "This Is Start Of life";
    char *pch = strtok(buf," "); 
    while(pch) 
    {
        printf("%s \n", pch);
        pch = strtok(NULL," "); 
    }*
    printf("Orignal Buffer:: %s ",buf);

Output is::
         This
         Is
         Start
         Of
         life
         Original Buffer:: This

I read that strtok returns pointer to the next token, then how the buf is getting affected? Is there way to retain original buffer (without extra copy overhead)?

Follow-on Question:: from so far answers I guess there is no way to retain the buffer. So what if I use dynamic array to create original buffer and if strtok is going to affect it, then there will be memory leak while freeing the original buffer or is strtok takes care of freeing memory?

Deanie
  • 2,316
  • 2
  • 19
  • 35
Meluha
  • 1,460
  • 3
  • 14
  • 19
  • As long as you retain the pointer returned by `malloc()` (or `calloc()` or `realloc()` — or whatever routine was used to allocate the memory), you can free the pointer with `free()`. The allocation routines don't care about the data in the memory that was allocated; all they require is that you pass a pointer to a chunk of memory that they previously allocated. – Jonathan Leffler Apr 21 '14 at 07:20
  • As an unrelated but important note, you may want to consider using a re-entrant form of strtok() that is thread-safe like strtok_r in linux or strtok_s on Windows. In addition, I usually only use strtok when I care about avoiding copying the buffer. In other cases I might turn to a more C++-esque tokenization technique e.g. http://stackoverflow.com/questions/53849/how-do-i-tokenize-a-string-in-c – aselle Apr 21 '14 at 07:48

2 Answers2

4

strtok() doesn't create a new string and return it; it returns a pointer to the token within the string you pass as argument to strtok(). Therefore the original string gets affected.

strtok() breaks the string means it replaces the delimiter character with NULL and returns a pointer to the beginning of that token. Therefore after you run strtok() the delim characters will be replaced by NULL characters. You can read link1 link2.

As you can see in output of example in link2, the output you are getting is as expected since the delim character is replaced by strtok.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
LearningC
  • 3,182
  • 1
  • 12
  • 19
  • So there is no excuse to copy if I want to retain the original buffer? also if I use dynamic array to create original buffer and strtok is going to affect it, then there will be memory leak or is strtok takes care of freeing memory? – Meluha Apr 21 '14 at 07:08
  • 1
    If you want to retain the original buffer undamaged, you'll have to make a copy and have `strtok()` parse it. Amongst other things, that would allow you to find out which of several different delimiters were zapped by `strtok()`, which is otherwise impossible to find out after the event. `strtok()` does no memory allocation or deallocation; it frees nothing because it allocates nothing. – Jonathan Leffler Apr 21 '14 at 07:15
3

When you do strtok(NULL, "|"), strtok finds a token and puts null on place (replace delimiter with '\0') and modifies the string. So you need to make the copy of the original string before tokenization.

Please try following:

void main(void)
{
    char buf[] = "This Is Start Of life"; 
    char *buf1;

    /* calloc() function will allocate the memory & initialize its to the NULL*/

    buf1 = calloc(strlen(buf)+1, sizeof(char)); 

    strcpy(buf1, buf);

    char *pch = strtok(buf," "); 
    while(pch) 
    {
        printf("%s \n", pch);
        pch = strtok(NULL," "); 
    }
    printf("Original Buffer:: %s ",buf1);  
}
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Parthiv Shah
  • 350
  • 3
  • 17