0

I am parsing a custom meta data file where the separator is a space ' '. The meta data file contains strings where the space separator should be omitted. So "\"This Space\"" is one token and "This Space" should be two tokens.

There is a questions like this here with an answer on how to get a result without using boost::tokenizer. This seems like a default task for a tokeniser and i assume that this should be possible using boost::tokenizer.

I wrote an example to show what i did so far:

#include <boost/tokenizer.hpp>
#include <vector>
#include <string>
#include <iostream>

using std::string;
using data = std::vector<string>;

data buildExpected()
{
    string s[] = {"This", "is one of", "42", "lines" };
    return data(s, s + 4);
}

data tokenizeLine(string line)
{
    using namespace boost;
    data d;
    char_separator<char> sep("\" ");
    tokenizer<char_separator<char>> tokens(line, sep);
    for (string tok : tokens) d.push_back(tok);
    return d;
}

void logData(string id, data &d)
{
    string line = "(" + id + "):";
    bool more = 0;
    for (auto s : d)
    {
        if (more) line += "; ";
        more = 1;
        line += s;
    }
    std::cout << line << std::endl;
}

void main()
{
    string line = "This \"is one of\" 42 lines";
    data expected = buildExpected();
    data actual = tokenizeLine(line);
    logData("expected", expected);
    logData("actual  ", actual);

}

This is the output on my system:

sample output

Community
  • 1
  • 1
Johannes
  • 6,490
  • 10
  • 59
  • 108

1 Answers1

1

Boost.Tokenizer doesn't handle quotes. Its functionality is very simple - just split to tokens on each separator occurrence. You need to handle parenthesis yourself.

Andriy Tylychko
  • 15,967
  • 6
  • 64
  • 112
  • I guess i need to up the complexity and use a different library. – Johannes Dec 09 '15 at 12:14
  • it shouldn't be really difficult to implement what you need with Boost.Tokenizer, especially if you don't need to support included quotes, you know something like "quote with "sub quote"" – Andriy Tylychko Dec 09 '15 at 12:16
  • I rolled things like it in the past, but i would rather use something off the shelve. For now i will just loosely couple the tokenizer part as good as i can and replace it when it becomes a problem. – Johannes Dec 09 '15 at 12:25