I made it self-contained: Live On Coliru
Now, when you want to understand the X3 grammar - beyond mentally debugging - you can enable
#define BOOST_SPIRIT_X3_DEBUG
This debugs rules. Consider adding some debug-only rules for more detailed info:
auto dbg(auto name, auto p) { return x3::rule<struct _>{name} = p; };
auto name = dbg("name", x3::alpha >> *x3::alnum); // to be improved later
auto length = dbg("length", ':' >> x3::double_);
auto leaf = dbg("leaf", -name);
auto internal = dbg("internal", '(' >> (branch % ',') >> ')' >> -name);
auto subtree = dbg("subtree", leaf | internal);
auto tree = dbg("tree", subtree >> ';');
Now the output will be e.g.: Live
<tree>
<try>;</try>
<subtree>
<try>;</try>
<leaf>
<try>;</try>
<name>
<try>;</try>
<fail/>
</name>
<success>;</success>
</leaf>
<success>;</success>
</subtree>
<success></success>
</tree>
";" -> true true
You can "trace" the rule invocations and results. Now, let's look at the first fail:
<tree>
<try>(,);</try>
<subtree>
<try>(,);</try>
<leaf>
<try>(,);</try>
<name>
<try>(,);</try>
<fail/>
</name>
<success>(,);</success>
</leaf>
<success>(,);</success>
</subtree>
<fail/>
</tree>
"(,);" -> false false
You can see it tries subtree, which tries leaf, which succeeds because leaf
is optional by definition:
auto leaf = -name;
A parser shaped -p
will always succeed. So in a|b
when a = -p
, the alternative b
will never be invoked. Either make name
less optional, or reorder your branches, so an internal
will get a chance before deciding an empty leaf
was matched:
auto subtree = internal | leaf;
Now we get:
void quetzal::newick::test::tree()
";" -> true true
"(,);" -> true true
"(,,(,));" -> true true
"(A,B,(C,D));" -> true true
"(A,B,(C,D)E)F;" -> true true
"(:0.1,:0.2,(:0.3,:0.4):0.5);" -> true true
"(:0.1,:0.2,(:0.3,:0.4):0.5):0.0;" -> false false
"(A:0.1,B:0.2,(C:0.3,D:0.4):0.5);" -> true true
"(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5)F;" -> true true
"((B:0.2,(C:0.3,D:0.4)E:0.5)F:0.1)A;" -> true true
Looking at the one remaining failing parse:
<tree>
<try>(:0.1,:0.2,(:0.3,:0.</try>
<subtree>
<try>(:0.1,:0.2,(:0.3,:0.</try>
<internal>
<try>(:0.1,:0.2,(:0.3,:0.</try>
<branch>
<try>:0.1,:0.2,(:0.3,:0.4</try>
<subtree>
<try>:0.1,:0.2,(:0.3,:0.4</try>
<internal>
<try>:0.1,:0.2,(:0.3,:0.4</try>
<fail/>
</internal>
<leaf>
<try>:0.1,:0.2,(:0.3,:0.4</try>
<name>
<try>:0.1,:0.2,(:0.3,:0.4</try>
<fail/>
</name>
<success>:0.1,:0.2,(:0.3,:0.4</success>
</leaf>
<success>:0.1,:0.2,(:0.3,:0.4</success>
</subtree>
<length>
<try>:0.1,:0.2,(:0.3,:0.4</try>
<success>,:0.2,(:0.3,:0.4):0.</success>
</length>
<success>,:0.2,(:0.3,:0.4):0.</success>
</branch>
<branch>
<try>:0.2,(:0.3,:0.4):0.5</try>
<subtree>
<try>:0.2,(:0.3,:0.4):0.5</try>
<internal>
<try>:0.2,(:0.3,:0.4):0.5</try>
<fail/>
</internal>
<leaf>
<try>:0.2,(:0.3,:0.4):0.5</try>
<name>
<try>:0.2,(:0.3,:0.4):0.5</try>
<fail/>
</name>
<success>:0.2,(:0.3,:0.4):0.5</success>
</leaf>
<success>:0.2,(:0.3,:0.4):0.5</success>
</subtree>
<length>
<try>:0.2,(:0.3,:0.4):0.5</try>
<success>,(:0.3,:0.4):0.5):0.</success>
</length>
<success>,(:0.3,:0.4):0.5):0.</success>
</branch>
<branch>
<try>(:0.3,:0.4):0.5):0.0</try>
<subtree>
<try>(:0.3,:0.4):0.5):0.0</try>
<internal>
<try>(:0.3,:0.4):0.5):0.0</try>
<branch>
<try>:0.3,:0.4):0.5):0.0;</try>
<subtree>
<try>:0.3,:0.4):0.5):0.0;</try>
<internal>
<try>:0.3,:0.4):0.5):0.0;</try>
<fail/>
</internal>
<leaf>
<try>:0.3,:0.4):0.5):0.0;</try>
<name>
<try>:0.3,:0.4):0.5):0.0;</try>
<fail/>
</name>
<success>:0.3,:0.4):0.5):0.0;</success>
</leaf>
<success>:0.3,:0.4):0.5):0.0;</success>
</subtree>
<length>
<try>:0.3,:0.4):0.5):0.0;</try>
<success>,:0.4):0.5):0.0;</success>
</length>
<success>,:0.4):0.5):0.0;</success>
</branch>
<branch>
<try>:0.4):0.5):0.0;</try>
<subtree>
<try>:0.4):0.5):0.0;</try>
<internal>
<try>:0.4):0.5):0.0;</try>
<fail/>
</internal>
<leaf>
<try>:0.4):0.5):0.0;</try>
<name>
<try>:0.4):0.5):0.0;</try>
<fail/>
</name>
<success>:0.4):0.5):0.0;</success>
</leaf>
<success>:0.4):0.5):0.0;</success>
</subtree>
<length>
<try>:0.4):0.5):0.0;</try>
<success>):0.5):0.0;</success>
</length>
<success>):0.5):0.0;</success>
</branch>
<name>
<try>:0.5):0.0;</try>
<fail/>
</name>
<success>:0.5):0.0;</success>
</internal>
<success>:0.5):0.0;</success>
</subtree>
<length>
<try>:0.5):0.0;</try>
<success>):0.0;</success>
</length>
<success>):0.0;</success>
</branch>
<name>
<try>:0.0;</try>
<fail/>
</name>
<success>:0.0;</success>
</internal>
<success>:0.0;</success>
</subtree>
<fail/>
</tree>
"(:0.1,:0.2,(:0.3,:0.4):0.5):0.0;" -> false false
Looking at the end clearly indicates the problem is that the length (":0.0") is encountered outside the last parentheses, where it is not expected. Perhaps you forgot that you were using tree
as the rule, not branch
? Anyways, you can probably take it from here.
Side Notes
You are using a skipper which will probably make your life unless you make some rules lexeme (like name
). I'd also suggest coding the skipper into your grammar:
auto tree = x3::skip(x3::space) [ subtree >> ';' ];
Note that space
includes newlines, so maybe you really want blank
instead. Finally, you can incorporate the f == l
iterator check into the grammar by appending >> eoi
:
auto tree = x3::skip(x3::space) [ subtree >> ';' >> x3::eoi ];
Full Listing
Also addressing the side notes and removing the debug/expositional stuff:
Live On Coliru
#include <boost/spirit/home/x3.hpp>
#include <iomanip>
#include <iostream>
namespace x3 = boost::spirit::x3;
namespace quetzal::newick::parser {
x3::rule<struct branch> branch{"branch"};
auto name = x3::lexeme[x3::alpha >> *x3::alnum]; // to be improved later
auto length = ':' >> x3::double_;
auto leaf = -name;
auto internal = '(' >> (branch % ',') >> ')' >> -name;
auto subtree = internal | leaf;
auto tree = x3::skip(x3::blank)[subtree >> ';' >> x3::eoi];
auto branch_def = subtree >> -length;
BOOST_SPIRIT_DEFINE(branch)
} // namespace quetzal::newick::parser
namespace quetzal::newick::test {
void run_tests(auto name, auto p, std::initializer_list<char const*> cases) {
std::cerr << "============ running " << name << " tests:\n";
for (std::string const input : cases)
std::cout << quoted(input) << " \t-> " << std::boolalpha
<< parse(begin(input), end(input), p) << std::endl;
}
void internal() {
run_tests("internal", quetzal::newick::parser::internal,
{
"(,)",
"(A,B)F",
"(A:10,B:10)F",
});
}
void tree() {
run_tests("tree", quetzal::newick::parser::tree,
{
";",
"(,);",
"(,,(,));",
"(A,B,(C,D));",
"(A,B,(C,D)E)F;",
"(:0.1,:0.2,(:0.3,:0.4):0.5);",
"(:0.1,:0.2,(:0.3,:0.4):0.5):0.0;",
"(A:0.1,B:0.2,(C:0.3,D:0.4):0.5);",
"(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5)F;",
"((B:0.2,(C:0.3,D:0.4)E:0.5)F:0.1)A;",
});
}
} // namespace quetzal::newick::test
int main() {
using namespace quetzal::newick::test;
internal();
tree();
}
Prints
============ running internal tests:
"(,)" -> true
"(A,B)F" -> true
"(A:10,B:10)F" -> true
============ running tree tests:
";" -> true
"(,);" -> true
"(,,(,));" -> true
"(A,B,(C,D));" -> true
"(A,B,(C,D)E)F;" -> true
"(:0.1,:0.2,(:0.3,:0.4):0.5);" -> true
"(:0.1,:0.2,(:0.3,:0.4):0.5):0.0;" -> false
"(A:0.1,B:0.2,(C:0.3,D:0.4):0.5);" -> true
"(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5)F;" -> true
"((B:0.2,(C:0.3,D:0.4)E:0.5)F:0.1)A;" -> true