First off, things are rarely a use-case for semantic actions (Boost Spirit: "Semantic actions are evil"?)¹.
Secondly, my gut says optional
is likely unnecessary for the vector/string fields. In my mind, a sequence container is already a generalization of optional
(where optional is like container with a maximum size of 1).
Thirdly, as far as I remember there might be limitations with std::optional
support. Seems I'm mis-remembering from variant
support Transitioning Boost Spirit parser from boost::variant to std::variant.
The Propagation Rules
Now, I'd suggest to keep two things in mind:
Always match the AST declarations as closely to your rule declarations as you can. In particular, your AST doesn't distinguish leaf nodes and internal nodes, which causes confusions because you hope to have both rules "magically" compatible with both.
Be specific about type coercions. x3::rule
does this, and you can use it explicitly. One of my recurring x3 devices is as<T>[p]
(or sometimes the functional version of it: as<T>(p, name)
):
template <typename T> struct as_type {
auto operator[](auto p) const {
struct Tag {};
return x3::rule<Tag, T>{"as"} = p;
}
};
template <typename T> static inline as_type<T> as{};
Note that some (most?) of your comment lines
// synthesize: std::string
// synthesize: double
// synthesize: std::optional<std::string>
// synthesize: std::pair< std::vector<...>, std::optional<std::string> >
// synthesized attribute: std::variant<node, std::optional<std::string>
// synthesize: std::pair<node, std::optional<double>>
// synthesize: std::pair<node, std::optional<double>>
are not very accurate. E.g. alpha >> *alnum
has fusion::deque<char, std::string> >
as attribute type. Mind you, the propagation machinery may be able to iron out the subtle difference when instantiating the x3::move_to
to the actual bound reference.
That actually turns out to be the case, see Simplify, Cleanup below
But as is, this ripples down: leaf
isn't optional<string>
now and so on.
Since you defined branch
without attribute type, its attribute type is actually x3::unused_type
. This means that subtree
doesn't contain a vector<...>
either. I had to check, so made a facility:
template <typename ExpectedAttributeType> void check(auto p) {
static_assert(
std::is_same_v<
ExpectedAttributeType,
typename x3::traits::attribute_of<decltype(p), x3::unused_type>::type>);
};
And now we can see that:
check<std::string>(name); // assuming a fixed `name` rule that coerces std::string
check<std::vector<std::string>>('(' >> name % ',' >> ')');
But for a parser that exposes unused_type
:
check<x3::unused_type>(x3::omit[name]);
check<std::vector<x3::unused_type>>('(' >> x3::omit[name] % ',' >> ')'); // FAILS!
check<x3::unused_type>('(' >> x3::omit[name] % ',' >> ')'); // Passes
See what I mean with keeping close control? The ripple-effect makes for wildly different attribute types from what you expected.
In addition, pair<node, optional<double>>
would not be compatible with any of your AST types (which happens to be just ast::node
of course).
Show, Don't Tell
As I know you to be a fast learner, and I don't know how to best explain the transformations I'd do step by step, I'll just show you the result of applying the above principles to the question code:
x3::rule<class branch, ast::node> node{"node"};
auto name = as<std::string>[x3::lexeme[x3::alpha >> *x3::alnum]];
auto length = as<double>[':' >> x3::double_];
auto leaf = as<std::string>[name | x3::attr(std::string{})];
auto children = '(' >> (node % ',') >> ')' | x3::attr(ast::nodes{});
auto node_def = children >> leaf >> -length;
auto tree = x3::skip(x3::blank)[node >> ';' >> x3::eoi];
BOOST_SPIRIT_DEFINE(node);
You can your expectations:
void checks() {
check<double>(length);
check<std::string>(leaf);
check<std::string>(name);
check<ast::nodes>('(' >> node % ',' >> ')'); // Passes
//check<ast::nodes>(children); // FAILS, actually variant<ast::nodes, ast::nodes> (!!)
check<ast::nodes>(as<ast::nodes>[children]); // Trivially compatible
check<ast::node>(tree);
}
Note how subtle the inner mechanisms can be. To be completely fair, Spirit Qi used to be way more predictable and forgiving than X3. I suspect the difference might be in favor of maintainability and compilation times?_
Testing
Borrowing some test cases from Grammar fails parsing tree with Boost Spirit X3 and debug-printing the parsed AST:
Live On Coliru prints
============ running internal tests:
"(,)" PASS -> node { [node { [], "" }, node { [], "" }], "" }
"(A,B)F" PASS -> node { [node { [], "A" }, node { [], "B" }], "F" }
"(A:10,B:10)F" PASS -> node { [node { [], "A":10 }, node { [], "B":10 }], "F" }
============ running tree tests:
";" PASS -> node { [], "" }
"(,);" PASS -> node { [node { [], "" }, node { [], "" }], "" }
"(,,(,));" PASS -> node { [node { [], "" }, node { [], "" }, node { [node { [], "" }, node { [], "" }], "" }], "" }
"(A,B,(C,D));" PASS -> node { [node { [], "A" }, node { [], "B" }, node { [node { [], "C" }, node { [], "D" }], "" }], "" }
"(A,B,(C,D)E)F;" PASS -> node { [node { [], "A" }, node { [], "B" }, node { [node { [], "C" }, node { [], "D" }], "E" }], "F" }
"(:0.1,:0.2,(:0.3,:0.4):0.5);" PASS -> node { [node { [], "":0.1 }, node { [], "":0.2 }, node { [node { [], "":0.3 }, node { [], "":0.4 }], "":0.5 }], "" }
"(:0.1,:0.2,(:0.3,:0.4):0.5):0.0;" PASS -> node { [node { [], "":0.1 }, node { [], "":0.2 }, node { [node { [], "":0.3 }, node { [], "":0.4 }], "":0.5 }], "":0 }
"(A:0.1,B:0.2,(C:0.3,D:0.4):0.5);" PASS -> node { [node { [], "A":0.1 }, node { [], "B":0.2 }, node { [node { [], "C":0.3 }, node { [], "D":0.4 }], "":0.5 }], "" }
"(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5)F;" PASS -> node { [node { [], "A":0.1 }, node { [], "B":0.2 }, node { [node { [], "C":0.3 }, node { [], "D":0.4 }], "E":0.5 }], "F" }
"((B:0.2,(C:0.3,D:0.4)E:0.5)F:0.1)A;" PASS -> node { [node { [node { [], "B":0.2 }, node { [node { [], "C":0.3 }, node { [], "D":0.4 }], "E":0.5 }], "F":0.1 }], "A" }
Simplify, Cleanup
After all is done, I usually go in and remove unnecessary rule instantiation.
Turns out, unsurprisingly, that indeed matching the AST to your rules 1:1 makes everything trivially compatible without coercing. In your case I'd probably also remove the optional<double>
, so let me show you in case you're interested:
struct node {
nodes children;
std::string name;
double length;
};
And the rules become:
x3::rule<class branch, ast::node> node{"node"};
auto name = x3::lexeme[x3::alpha >> *x3::alnum];
auto length = ':' >> x3::double_ | x3::attr(0.0);
auto leaf = name | x3::attr(std::string{});
auto children = '(' >> (node % ',') >> ')' | x3::attr(ast::nodes{});
auto node_def = children >> leaf >> -length;
auto tree = x3::skip(x3::blank)[node >> ';' >> x3::eoi];
BOOST_SPIRIT_DEFINE(node);
For good measure let's reword the checks
to only verify the expected compatibilities:
template <typename ExpectedAttributeType> void compatible(auto p) {
static_assert(
std::is_same_v<ExpectedAttributeType,
typename x3::traits::attribute_of<
decltype(x3::rule<struct _, ExpectedAttributeType>{} = p),
x3::unused_type>::type>);
};
void checks() {
compatible<double>(length);
compatible<std::string>(leaf);
compatible<std::string>(name);
compatible<ast::nodes>(children);
compatible<ast::node>(tree);
}
Full Listing
Live On Coliru
#include <boost/fusion/include/adapted.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iomanip>
#include <iostream>
namespace ast {
using nodes = std::vector<struct node>;
struct node {
nodes children;
std::string name;
double length;
};
// for debug output
static inline std::ostream& operator<<(std::ostream& os, node const& n) {
os << "node { [";
for (auto sep = ""; auto& c : n.children)
os << std::exchange(sep, ", ") << c;
os << "], " << quoted(n.name) ;
if (n.length != 0)
os << ":" << n.length;
return os << " }";
}
} // namespace ast
BOOST_FUSION_ADAPT_STRUCT(ast::node, children, name, length)
#include <boost/fusion/include/vector.hpp>
#include <boost/fusion/include/std_pair.hpp>
namespace parser {
namespace x3 = boost::spirit::x3;
static x3::rule<class branch, ast::node> const node{"node"};
static auto const name = x3::lexeme[x3::alpha >> *x3::alnum];
static auto const length = ':' >> x3::double_ | x3::attr(0.0);
static auto const leaf = name | x3::attr(std::string{});
static auto const children = '(' >> (node % ',') >> ')' | x3::attr(ast::nodes{});
static auto const node_def = children >> leaf >> -length;
static auto const tree = x3::skip(x3::blank)[node >> ';' >> x3::eoi];
BOOST_SPIRIT_DEFINE(node);
template <typename ExpectedAttributeType> void compatible(auto p) {
static_assert(
std::is_same_v<ExpectedAttributeType,
typename x3::traits::attribute_of<
decltype(x3::rule<struct _, ExpectedAttributeType>{} = p),
x3::unused_type>::type>);
};
void checks() {
compatible<double>(length);
compatible<std::string>(leaf);
compatible<std::string>(name);
compatible<ast::nodes>(children);
compatible<ast::node>(tree);
}
} // namespace parser
namespace test {
void run_tests(auto name, auto p, std::initializer_list<char const*> cases) {
std::cerr << "============ running " << name << " tests:\n";
for (std::string const input : cases)
{
ast::node n;
auto ok = parse(begin(input), end(input), p, n);
std::cout << quoted(input) << "\t " << (ok ? "PASS" : "FAIL");
if (ok)
std::cout << " -> " << n << std::endl;
else
std::cout << std::endl;
}
}
void internal() {
run_tests("internal", parser::node,
{
"(,)",
"(A,B)F",
"(A:10,B:10)F",
});
}
void tree() {
run_tests("tree", parser::tree,
{
";",
"(,);",
"(,,(,));",
"(A,B,(C,D));",
"(A,B,(C,D)E)F;",
"(:0.1,:0.2,(:0.3,:0.4):0.5);",
"(:0.1,:0.2,(:0.3,:0.4):0.5):0.0;",
"(A:0.1,B:0.2,(C:0.3,D:0.4):0.5);",
"(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5)F;",
"((B:0.2,(C:0.3,D:0.4)E:0.5)F:0.1)A;",
});
}
} // namespace test
int main() {
test::internal();
test::tree();
}
With identical output (except for the suppressed 0.0
length in the debug output).
¹ In fairness in X3 I'll more happily consider it as writing them has become a lot more natural.