1

I am splitting my string based on two delimiters so far, but I would like to extend this to a possibility where the number of delimiters is variable. Right now, I have this function:

void dac_sim::dac_ifs::dac_sim_subcmd_if::parse_cmd(std::string command, std::array<std::string, 2> delimiters)
{
  std::string str = command;
  std::vector< std::string > vec;

  auto it = str.begin(), end = str.end();
  bool res = boost::spirit::qi::parse(it, end,
    boost::spirit::qi::as_string[ *(boost::spirit::qi::char_ - delimiters[0] - delimiters[1]) ] % (boost::spirit::qi::lit(delimiters[0]) | boost::spirit::qi::lit(delimiters[1])),
    vec);

  std::cout << "Parsed:";
  for (auto const& s : vec)
    std::cout << " \"" << s << "\"";
    std::cout << std::endl;
} 

But now I want something more generic, via template for the array size, like this:

template <size_t N>
void dac_sim::dac_ifs::dac_sim_subcmd_if::parse_cmd(std::string command, std::array<std::string, N> delimiters)

In this case, how can I procceed?

sehe
  • 374,641
  • 47
  • 450
  • 633
rere
  • 25
  • 6

1 Answers1

0

Fold Expressions

Can you use c++17? I'd use fold-expressions:

auto parse_cmd(std::string_view str, auto const&... delim) {
    namespace qi = boost::spirit::qi;
    std::vector<std::string> vec;

    qi::parse(str.begin(), str.end(),
              qi::as_string[*(qi::char_ - ... - delim)] % (qi::lit(delim) | ...) //
                  > qi::eoi,
              vec);

    return vec;
}

Test it Live On Coliru

for (auto input :
     {
         "",
         "|",
         "|,",
         "|,||",
         "foo||bar,qux,stux;net||more||||to,come",
     }) //
{
    fmt::print("{:<30} -> {}\n", fmt::format("'{}'", input), parse_cmd(input, "||", ","));
}

Prints

''              -> [""]
'|'             -> ["|"]
'|,'            -> ["|", ""]
'|,||'          -> ["|", "", ""]
'foo||bar,qux,stux;net||more||||to,come' -> ["foo", "bar", "qux", "stux;net", "more", "", "to", "come"]

But You Need Arrays?

You can always use the index-sequence trick to transform into a parameter pack:

template <size_t N>
auto parse_cmd(std::string_view str, std::array<std::string, N> const& delims) {
    return [&]<size_t... I>(std::index_sequence<I...>) {
        return do_parse_cmd(str, delims[I]...);
    }(std::make_index_sequence<N>{});
}

Where do_parse_cmd is the function just shown above. Let's demo with ";" added as a third delimiter: Live On Coliru

std::array<std::string, 3> delimiters{"||", ",", ";"};

for (auto input :
     {
         "",
         "|",
         "|,",
         "|,||",
         "foo||bar,qux,stux;net||more||||to,come",
     }) //
{
    fmt::print("{:<15} -> {}\n", fmt::format("'{}'", input), parse_cmd(input, delimiters));
}

Prints

''              -> [""]
'|'             -> ["|"]
'|,'            -> ["|", ""]
'|,||'          -> ["|", "", ""]
'foo||bar,qux,stux;net||more||||to,come' -> ["foo", "bar", "qux", "stux", "net", "more", "", "to", "come"]

Note how stux;net is correctly split now.

Problems

  • versions
  • semantic problems
  • flexibility

Versions

For one, the above requires c++17 for the fold-expressions, and the demos also liberally use c++20 features to make it all easy to demonstrate. If you don't have that, even the c++17 version will become a lot more tedious.

Semantic problems

There's an issue when the caller passes delimiters in a sub-optimal way. E.g., {":", ":|:"} won't work, but {":|:", ":"} will. That's because of the overlapping pattern. You would want to be smarter.

Flexibility

You might want to be able to have full-blown parser expression capability instead of fixed string literals. Let me postpone this for later

Qi Symbols

To support c++11 and solve the semantic issue, let's use qi::symbols:

using tokens = std::vector<std::string>;

template <size_t N> tokens
parse_cmd(std::string const& str, std::array<std::string, N> const& delims) {
    namespace qi = boost::spirit::qi;

    qi::symbols<char> delim;
    for (auto& d : delims)
        delim += d;

    tokens vec;
    parse(str.begin(), str.end(), qi::as_string[*(qi::char_ - delim)] % delim > qi::eoi, vec);
    return vec;
}

This internally builds a Trie so the order in which delimiters are passed doesn't matter. The longest possible match will always match a single delim expression.

With the same test: Live On Coliru (c++11)

''              -> [""]
'|'             -> ["|"]
'|,'            -> ["|", ""]
'|,||'          -> ["|", "", ""]
'foo||bar,qux,stux;net||more||||to,come' -> ["foo", "bar", "qux", "stux", "net", "more", "", "to", "come"]

Future Proofing

To be completely flexible and compose the parser from any parser expression, you would have to thread the needle in Qi, and get considerable compile times:

Suffice it to say, I won't recommend it. However, using X3¹ none of this is hard, and you could easily achieve it

Identical X3 version

Live On Coliru. 'Nuff said

Generalize (Computer, Enhance!)

Basically replacing std::string with auto in the fold-expression variant:

auto parse_cmd(std::string const& str, auto... delims) {
    tokens vec;
    parse(str.begin(), str.end(),
          *(x3::char_ - ... - x3::as_parser(delims)) //
                  % (x3::as_parser(delims) | ...)    //
              > x3::eoi,
          vec);
    return vec;
}

Now you can do funky stuff, like: Live On Coliru

static constexpr auto input = "foo (false) bar (   true ) qux (4.8e-9) <!-- any comment --> quz";
fmt::print("input: '{}'\n", input);

auto test = [](auto name, auto... p) {
    fmt::print("{:>5}: {}\n", name, parse_cmd(input, p...));
};

constexpr auto d = "(" >> x3::double_ >> ")";
constexpr auto b = x3::skip(x3::blank)["(" >> x3::bool_ >> ")"];
constexpr auto x = "<!--" >> *(x3::char_ - "-->") >> "-->";

test("d", d);
test("b", b);
test("x", x);
test("x|b|d", x, b, d);

Printing

input: 'foo (false) bar (   true ) qux (4.8e-9) <!-- any comment --> quz'
    d: ["foo (false) bar (   true ) qux ", " <!-- any comment --> quz"]
    b: ["foo", " bar", " qux (4.8e-9) <!-- any comment --> quz"]
    x: ["foo (false) bar (   true ) qux (4.8e-9) ", " quz"]
x|b|d: ["foo", " bar", " qux ", " ", " quz"]

Summary/TL;DR

Combining parsers in X3 is a joy, and crazy powerful. It will typically still be faster to compile than the Qi parsers.

Note that at no point in this answer did I question why you are reinventing tokenization using a (checks notes) parser generator. Perhaps you should tell me what you're actually building or parsing, and I could give you some real advice on how to use Spirit for great success :)


¹ which is c++14 only and will become c++17 only in the future

sehe
  • 374,641
  • 47
  • 450
  • 633
  • 1
    Really really nice answer, I ve learned a lot with that. Thank you very very much :) – rere Apr 24 '23 at 07:43
  • answering your question... I am given a command string and need to do some action based on that. For example: I get the command: Createfilenamepath or Writebitnumberportnumbersomething... (there are several possible combinations and several commands) I used std::map for the command (Create for example) and that expects an argument, which would be the filenamepath. At that point I would need a parse to separate the filename from and then do what is needed according to the command. – rere Apr 24 '23 at 07:56