1

Following is how I am using string tokenizer.

typedef std::string                      string_t;
typedef std::vector<string_t>            stations_t;

void Tokenize(const string_t& str, stations_t& tokens,const string_t& delimiters = " ") {
    string_t::size_type lastPos = str.find_first_not_of(delimiters, 0);
    string_t::size_type pos     = str.find_first_of(delimiters, lastPos);
    while (string_t::npos != pos || string_t::npos != lastPos){
        tokens.push_back(str.substr(lastPos, pos - lastPos));
        lastPos = str.find_first_not_of(delimiters, pos);
        pos = str.find_first_of(delimiters, lastPos);
    }
}

When I am passing string 1,8003,1,HWH,Kolkata Howrah Junction,,16:10,,1,0 to this , it is returning me 8 fields, where as it should return 9 , it is complete ignoring ,, part. Can anybody pls take a look and help me to find the bug here.

Avinash
  • 12,851
  • 32
  • 116
  • 186

2 Answers2

2

In the example you provide, you want an empty field between "16:10" and "1", right?

The reason you are not getting it, is because when you gotten the sub-string "16:10", then pos is 43, and you look for a character not in the delimiter string starting at that position. The first non-delimiter character is the "1" at position 45.

I suggest something like this:

void Tokenize2(const string_t& str, stations_t& tokens,const string_t& delimiters = " ") {
    string_t::size_type elem_start = 0;
    string_t::size_type elem_end  = str.find_first_of(delimiters, 0);
    while (elem_start != string_t::npos && elem_end != string_t::npos) {
        tokens.push_back(str.substr(elem_start, elem_end - elem_start));
        elem_start = str.find_first_of(delimiters, elem_end) + 1;
        elem_end   = str.find_first_of(delimiters, elem_start);
    }

    if (elem_start != string_t::npos) {
        // Get the last element
        tokens.push_back(str.substr(elem_start, elem_end - elem_start));
    }
}
Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
1

The bug is in your logic of finding a token.

lastPos = str.find_first_not_of(delimiters, 0);
pos     = str.find_first_of(delimiters, lastPos);

Basically you try to find a character that is not a delimiter and assign it to lastPos, then you proceed to find the first delimiter after lastPos and assign it to pos and grab everything between lastPost and pos to be a token. Basically the act of trying to find find_first_not_of will skip any consecutive delimiters. You can use the test input

,,,,,,,,22,

and you will find that the first iteration finds the token 22 and skips all consecutive ","s

How do I tokenize a string in C++? has plenty of ways to write a tokenizer

Community
  • 1
  • 1
parapura rajkumar
  • 24,045
  • 1
  • 55
  • 85