3

I am trying to use strtok in C++ to get tokens of a string. But, I see that in one ur of 5 run the tokens being returned by function are incorrect. Can someone pls, suggest what can be the problem?

Sample code reproducing the issue I am facing:

#include<iostream>
#include<vector>
#include<cstring>

using namespace std;
#define DEBUG(x) cout<<x<<endl;


void split(const string &s, const char* delim, vector<string> & v)
{
        DEBUG("Input string to split:"<<s);

        // to avoid modifying original string first duplicate the original string and return a char pointer then free the memory
        char * dup = strdup(s.c_str());
        DEBUG("dup is:"<<dup);
        int i=0;
        char* token = strtok(dup,delim);

        while(token != NULL)
        {
                DEBUG("token is:"<<string(token));
                v.push_back(string(token));
                // the call is treated as a subsequent calls to strtok:
                // the function continues from where it left in previous invocation
                token = strtok(NULL,delim);
        }
        free(dup);
}

int main()
{
        string a ="MOVC R1,R1,#434";

        vector<string> tokens;
        char delims[] = {' ',','};
        split(a,delims,tokens);
        return 0;
}

Sample output:

mayank@Mayank:~/Documents/practice$ ./a.out 
Input string to split:MOVC R1,R1,#434
dup is:MOVC R1,R1,#434
token is:MOVC
token is:R1
token is:R1
token is:#434

mayank@Mayank:~/Documents/practice$ ./a.out 
Input string to split:MOVC R1,R1,#434
dup is:MOVC R1,R1,#434
token is:MO
token is:C
token is:R1
token is:R1
token is:#434

As you can see in second run the tokens created are MO C R1 R1 #434 and not MOVC R1 R1 #434

I tried checking library code too but not able to figure out the mistake. Please help.

EDIT1: My gcc version is: gcc version 6.2.0 20161005 (Ubuntu 6.2.0-5ubuntu12)

Mayank Jain
  • 2,504
  • 9
  • 33
  • 52

1 Answers1

9
char delims[] = {' ',','};

should be

char delims[] = " ,";

You're passing a list of chars instead of a char * bearing the list of delimiters to use, hence unexpected behaviour because strtok expects a 0-terminated string. In your case, strtok goes "in the woods" and tokenizes with anything after the declared array.

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219