The docs for boost::escaped_list_separator
provide the following explanation for the second parameter c
:
Any character in the string c, is considered to be a separator.
So, I need to split the string with multiple separators, allowing the quoted values, which can contain these separators within:
#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>
int main() {
std::wstring str = L"2 , 14 33 50 \"AAA BBB\"";
std::wstring escSep(L"\\"); //escape character
std::wstring delim(L" \t\r\n,"); //split on spaces, tabs, new lines, commas
std::wstring quotes(L"\""); //allow double-quoted values with delimiters within
boost::escaped_list_separator<wchar_t> separator(escSep, delim, quotes);
boost::tokenizer<boost::escaped_list_separator<wchar_t>, std::wstring::const_iterator, std::wstring> tok(str, separator);
for(auto beg=tok.begin(); beg!=tok.end();++beg)
std::wcout << *beg << std::endl;
return 0;
}
The expected result would be [2; 14; 33; 50; AAA BBB]. However, his code results in bunch of empty tokens:
Regular boost::char_separator
omits all these empty tokens, considering all delimiters. It seems that boost::escaped_list_separator
also considers all specified delimiters, but produces empty values. Is it true that if multiple consecutive delimiters are encountered, it will produce empty tokens? Is there any way to avoid this?
If it's always true, that only empty tokens are produced, it's easy to test the resulting values and omit them manually. But, it can get pretty ugly. For example, imagine strings each with 2 actual values and possibly with many tabs AND spaces separating the values. Then specifying delimiters as L"\t "
(i.e. space and tab) will work, but produce a ton of empty tokens.