0

I have a string that looks like 1,3-5,7,9-11 and I'm to tokenize it with repeated calls to strtok so that the output looks something like:

1
3
5
7
9
11

My code looks like this:

#include <stdio.h>
#include <string.h>

void tokenize(char *string){
    char *token;
    token = strtok (string,"-");
    while (token != NULL) {
            // ... do some other unrelated stuff ...
            printf("\tToken %s\n", token);
            token = strtok (NULL, ",");
    }
}

int main (int argc,char **argv)
{
    char *token;
    token = strtok (*(argv+1),",");
    while (token != NULL) {
            if (strchr(token,45)){  //45 is ASCII for "-".
                    tokenize(token);
            }
            printf("Token1 %s \n", token);
            token = strtok (NULL, ",");
    }
    return 0;
}

However, when I run the code it ends prematurely and I get:

./tokenizer 1,3-5,7,9-11
Token1 1
        Token 3
        Token 5
Token1 3

but I expect/want something like:

./tokenizer 1,3-5,7,9-11
Token1 1
        Token 3
        Token 5
Token1 7
        Token 9
        Token 11

If I comment out the line that reads tokenize(temptoken); (in other words, strtok on "," only), then the output looks like one would expect:

./tokenizer 1,3-5,7,9-11
Token1 1
Token1 3-5
Token1 7
Token1 9-11

So it looks like the problem really is with the subsequent strtok calls to the already tokenized string so I tried to memcpy memory pointed to be the token pointer but that didn't really help:

#include <stdio.h>
#include <string.h>

void tokenize(char *string){
    char *token;
    token = strtok (string,"-");
    while (token != NULL) {

            printf("\tToken %s\n", token);
            token = strtok (NULL, ",");
    }
}

int main (int argc,char **argv)
{
    char *token;
    char *temptoken ;
    token = strtok (*(argv+1),",");
    while (token != NULL) {
            if (strchr(token,45)){  //45 is ASCII for "-".
/* added memcpy */  memcpy(temptoken,token,strlen(token)+1);
                    tokenize(temptoken);
            }
            printf("Token1 %s \n", token);
            token = strtok (NULL, ",");
    }
    return 0;
} 


$ ./tokenizer 1,3-5,7,9-11 
Token1 1
        Token 3
        Token 5
Token1 3-5

Any ideas of what I can do to fix the code, understand where my misunderstanding lies, and get the desired output?

2 Answers2

5

You can not use nested strtok() because it uses some static memory to save its context between invocations to know the current position in the string being tokenized.

Use strtok_r() instead, which is a reentrant version of strtok that doesn't have any internal state.

Jesferman
  • 1,049
  • 7
  • 12
  • 1
    Yes and just reiterating for the OP (and others) that `strtok` is a rather ancient library function-- don't feel bad for being confused by its designed behavior, you're not the first one. Nobody designs library/API functions like this anymore! – Ben Zotto Jul 13 '17 at 15:21
-1
while (token != NULL) {
        if (strchr(token,45)){  //45 is ASCII for "-".
         /* added memcpy */  memcpy(temptoken,token,strlen(token)+1);
                tokenize(temptoken);
        }
        printf("Token1 %s \n", token);
        token = strtok (NULL, ",");
}

And what do you expect.

you are finding the token ',' then tokenize it with your function (and print the tokens) then you print that token before the subtokenisation again and it finishes as strtok has an internal state.

So it works exactly as you wrote it.

you need to: use reentrant version od strtok,

you should return the value in your tokenize function to indicate if subtokens have been found: if not print the token, if yes do not.

0___________
  • 60,014
  • 4
  • 34
  • 74