boost::tokenizer to consider absence of tokens between separators

Question

I am using boost::tokenizer to get ';' separated fields from a string. I am able to retrieve the fields as shown in the code below but i have 2 questions:

Is there any function which tokenizer provides to know the count of tokens in a string based on the separator provided?
Supposing the test string has 3 fields a;b;c . The following piece of code will print all of them. But i need to print empty fields too. E.g. incase of a string a;;;b;c the token should also contain nothing as 2nd and 3rd element. Or in other words the 2nd and 3rd token should be empty.

#include <boost/tokenizer.hpp>
namespace std;
namespace boost;
int main()
{
    string data="a;;;;b;c";
    boost::char_separator<char> obj(";");
    boost::tokenizer<boost::char_separator<char> > tokens(data,obj);
    cout<<endl<<tokens.countTokens();
    for(boost::tokenizer<boost::char_separator<char> >::iterator it=tokens.begin();
    it!=tokens.end();
    ++it)
    {
        std::cout<<*it<<endl;
    }
}

your second question is answered here: http://stackoverflow.com/questions/22331648/boosttokenizer-point-seperated-but-also-keeping-empty-fields — m.s., Oct 29 '15 at 12:30

score 7 · Accepted Answer · answered Oct 29 '15 at 12:48

7

1) You can just count difference between end and begin.

const size_t count = std::distance(tokens.begin(), tokens.end());

2) You should just construct separator right.

boost::char_separator<char> obj(";", "", boost::keep_empty_tokens);

Live example

answered Oct 29 '15 at 12:48

ForEveR

55,233
2
119
133

Thanks. It worked. I just read that _space_ is considered as a separator by default. So `a;b; c;d` would fetch only a and b because it encounters two spaces after `b;` . Is there anyway i let tokenizer stop treating spaces as token separators? – anurag86 Oct 29 '15 at 12:53
i kept the 2nd parater as " ". But it is still not working.I did it this way : `boost::char_separator obj(";"," ",boost::keep_empty_tokens); ` – anurag86 Oct 29 '15 at 13:08
@anurag86 look, in my code there is no space, but just empty string. – ForEveR Oct 29 '15 at 13:17
I mean i got the answer for the 2 questions asked. I am asking now that if my string contains `;` separated fields if the fields itself contains space then the tokenizer thinks _space_ also as one of the separators. I dont want tokenizer to treat _space_ as a separator. How can i do it? – anurag86 Oct 29 '15 at 13:43
@anurag86 if you want to ignore space - just construct separator like "; " in first argument. – ForEveR Oct 29 '15 at 13:56

boost::tokenizer to consider absence of tokens between separators

1 Answers1