Okay, the explanations helped me realize the correspondence between the input and the XML. There's still a number of ... unclear specs, but let's roll with it.
Parsing
AST
As always, I start out with the AST. This time instead of basing it on the sample input, it was easier to base it on the output XML:
namespace Ast {
using boost::recursive_wrapper;
using Id = std::string;
using Literal = std::string;
using Enum = std::vector<Id>;
struct Base {
Id id;
Literal literal;
};
struct Simple : Base {
Enum enumeration;
};
struct Complex;
struct Container;
using Class = boost::variant<
Simple,
recursive_wrapper<Complex>,
recursive_wrapper<Container>
>;
using Classes = std::vector<Class>;
struct Container : Base { Class element; };
struct Complex : Base { Classes members; };
using Task = std::vector<Class>;
} // namespace Ast
So far so good. No surprises. The main thing is using recursive variants to allow nesting complex/container types. As a side note I reflected the common parts of all types as Base
. Let's adapt these for use as Fusion sequences:
BOOST_FUSION_ADAPT_STRUCT(Ast::Simple, id, literal, enumeration);
BOOST_FUSION_ADAPT_STRUCT(Ast::Complex, id, literal, members)
BOOST_FUSION_ADAPT_STRUCT(Ast::Container, id, literal, element)
Now Spirit will know how to propagate attributes without further help.
Grammar
The skeleton is easy, just mapping AST nodes to rules:
template <typename It> struct Task : qi::grammar<It, Ast::Task()> {
Task() : Task::base_type(start) {
start = skip(space)[task_];
// ...
}
private:
qi::rule<It, Ast::Task()> start;
using Skipper = qi::space_type;
qi::rule<It, Ast::Task(), Skipper> task_;
qi::rule<It, Ast::Class(), Skipper> class_;
qi::rule<It, Ast::Simple(), Skipper> simple_;
qi::rule<It, Ast::Complex(), Skipper> complex_;
qi::rule<It, Ast::Container(), Skipper> container_;
// lexemes:
qi::rule<It, Ast::Id()> id_;
qi::rule<It, Ast::Literal()> literal_;
};
Note I grouped the lexemes (that do not allow a skipper) and encapsulated the space
skipper into the start rule.
Because "classes" can appear explicitly, but also without the leading Class
keyword, I will introduce an extra rule type_
so we can say:
task_ = *class_ > eoi;
type_ = simple_ | complex_ | container_;
class_ = "Class" > type_ > ';';
And also use type_
where Simple/Complex/Container is acceptable.
For the rest, there aren't many surprises, so let's show the whole constructor block:
Task() : Task::base_type(start) {
using namespace qi;
start = skip(space)[task_];
// lexemes:
id_ = raw[alpha >> *('_' | alnum)];
literal_ = '"' > *('\\' >> char_ | ~char_('"')) > '"';
auto optlit = copy(literal_ | attr(std::string(" "))); // weird, but okay
task_ = *class_ > eoi;
type_ = simple_ | complex_ | container_;
class_ = lit("Class") > type_ > ';';
simple_ = lit("Simple") >> id_ >> optlit >> enum_;
complex_ = lit("Complex") >> id_ >> optlit >> '(' >> *type_ >> ')';
container_ = lit("Container") >> id_ >> optlit >> '(' >> type_ > ')';
enum_ = -(lit("enumeration") >> '(' >> (id_ % ',') > ')');
BOOST_SPIRIT_DEBUG_NODES(
(task_)(class_)(type_)(simple_)(complex_)(container_)(enum_)(id_)(literal_))
}
Note the other "extra" (enum_
). Of course, I could have kept it all in the simple_
rule instead.
Here's a Live Demo printing the raw AST for the sample input:
- (caption " " {})
- (columns "Column Name" {})
- (CONTAINER_NAME " " (OBJECT_NAME " " {(obj_id " " {}), (obj_property1 " " {}), (obj_attribute " " {EnumOption1, EnumOption2, EnumOption3, EnumOption4}), (OBJECT_ITEMS " " (OBJECT_ITEM " " {(obj_item_name " " {}), (set_value " " (obj_item_value " " {}))}))}))
It's just a shame that all my pretty error handling code is not firing :) The output is obviously pretty ugly, so let's fix that.
Generating XML
I'm not a Microsoft fan, and prefer other libraries for XML anyways (see What XML parser should I use in C++?).
So I'll choose PugiXML here.
Generator
Simply put, we have to teach the computer how to convert any Ast node into XML:
#include <pugixml.hpp>
namespace Generate {
using namespace Ast;
struct XML {
using Node = pugi::xml_node;
// callable for variant visiting:
template <typename T> void operator()(Node parent, T const& node) const { apply(parent, node); }
private:
void apply(Node parent, Ast::Class const& c) const {
using std::placeholders::_1;
boost::apply_visitor(std::bind(*this, parent, _1), c);
}
void apply(Node parent, Id const& id) const {
auto identifier = named_child(parent, "identifier");
identifier.text().set(id.c_str());
}
void apply(Node parent, Literal const& l) const {
auto literal = named_child(parent, "literal");
literal.text().set(l.c_str());
}
void apply(Node parent, Simple const& s) const {
auto simple = named_child(parent, "simple");
apply(simple, s.id);
apply(simple, s.literal);
apply(simple, s.enumeration);
}
void apply(Node parent, Enum const& e) const {
if (!e.empty()) {
auto enum_ = named_child(parent, "enumeration");
for (auto& v : e)
named_child(enum_, "word").text().set(v.c_str());
}
}
void apply(Node parent, Complex const& c) const {
auto complex_ = named_child(parent, "complex");
apply(complex_, c.id);
apply(complex_, c.literal);
for (auto& m : c.members)
apply(complex_, m);
}
void apply(Node parent, Container const& c) const {
auto cont = named_child(parent, "container");
apply(cont, c.id);
apply(cont, c.literal);
apply(cont, c.element);
}
void apply(Node parent, Task const& t) const {
auto task = named_child(parent, "task");
for (auto& c : t)
apply(task, c);
}
private:
Node named_child(Node parent, std::string const& name) const {
auto child = parent.append_child();
child.set_name(name.c_str());
return child;
}
};
} // namespace Generate
I'm not gonna say I typed this up error-free in a jiffy, but you'll recognize the pattern: It's following the Ast 1:1 to great success.
FULL DEMO
Integrating all the above, and printing the XML output:
Live On Compiler Explorer
// #define BOOST_SPIRIT_DEBUG 1
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
namespace qi = boost::spirit::qi;
namespace Ast {
using boost::recursive_wrapper;
using Id = std::string;
using Literal = std::string;
using Enum = std::vector<Id>;
struct Base {
Id id;
Literal literal;
};
struct Simple : Base {
Enum enumeration;
};
struct Complex;
struct Container;
using Class = boost::variant< //
Simple, //
recursive_wrapper<Complex>, //
recursive_wrapper<Container> //
>;
using Classes = std::vector<Class>;
struct Container : Base { Class element; };
struct Complex : Base { Classes members; };
using Task = std::vector<Class>;
} // namespace Ast
BOOST_FUSION_ADAPT_STRUCT(Ast::Simple, id, literal, enumeration);
BOOST_FUSION_ADAPT_STRUCT(Ast::Complex, id, literal, members)
BOOST_FUSION_ADAPT_STRUCT(Ast::Container, id, literal, element)
namespace Parser {
template <typename It> struct Task : qi::grammar<It, Ast::Task()> {
Task() : Task::base_type(start) {
using namespace qi;
start = skip(space)[task_];
// lexemes:
id_ = raw[alpha >> *('_' | alnum)];
literal_ = '"' > *('\\' >> char_ | ~char_('"')) > '"';
auto optlit = copy(literal_ | attr(std::string(" "))); // weird, but okay
task_ = *class_ > eoi;
type_ = simple_ | complex_ | container_;
class_ = lit("Class") > type_ > ';';
simple_ = lit("Simple") >> id_ >> optlit >> enum_;
complex_ = lit("Complex") >> id_ >> optlit >> '(' >> *type_ >> ')';
container_ = lit("Container") >> id_ >> optlit >> '(' >> type_ > ')';
enum_ = -(lit("enumeration") >> '(' >> (id_ % ',') > ')');
BOOST_SPIRIT_DEBUG_NODES(
(task_)(class_)(type_)(simple_)(complex_)(container_)(enum_)(id_)(literal_))
}
private:
qi::rule<It, Ast::Task()> start;
using Skipper = qi::space_type;
qi::rule<It, Ast::Task(), Skipper> task_;
qi::rule<It, Ast::Class(), Skipper> class_, type_;
qi::rule<It, Ast::Simple(), Skipper> simple_;
qi::rule<It, Ast::Complex(), Skipper> complex_;
qi::rule<It, Ast::Container(), Skipper> container_;
qi::rule<It, Ast::Enum(), Skipper> enum_;
// lexemes:
qi::rule<It, Ast::Id()> id_;
qi::rule<It, Ast::Literal()> literal_;
};
}
#include <pugixml.hpp>
namespace Generate {
using namespace Ast;
struct XML {
using Node = pugi::xml_node;
// callable for variant visiting:
template <typename T> void operator()(Node parent, T const& node) const { apply(parent, node); }
private:
void apply(Node parent, Ast::Class const& c) const {
using std::placeholders::_1;
boost::apply_visitor(std::bind(*this, parent, _1), c);
}
void apply(Node parent, std::string const& s, char const* kind) const {
named_child(parent, kind).text().set(s.c_str());
}
void apply(Node parent, Simple const& s) const {
auto simple = named_child(parent, "simple");
apply(simple, s.id, "identifier");
apply(simple, s.literal, "literal");
apply(simple, s.enumeration);
}
void apply(Node parent, Enum const& e) const {
if (!e.empty()) {
auto enum_ = named_child(parent, "enumeration");
for (auto& v : e)
named_child(enum_, "word").text().set(v.c_str());
}
}
void apply(Node parent, Complex const& c) const {
auto complex_ = named_child(parent, "complex");
apply(complex_, c.id, "identifier");
apply(complex_, c.literal, "literal");
for (auto& m : c.members)
apply(complex_, m);
}
void apply(Node parent, Container const& c) const {
auto cont = named_child(parent, "container");
apply(cont, c.id, "identifier");
apply(cont, c.literal, "literal");
apply(cont, c.element);
}
void apply(Node parent, Task const& t) const {
auto task = named_child(parent, "task");
for (auto& c : t)
apply(task.append_child("class"), c);
}
private:
Node named_child(Node parent, std::string const& name) const {
auto child = parent.append_child();
child.set_name(name.c_str());
return child;
}
};
} // namespace Generate
int main() {
using It = std::string_view::const_iterator;
static const Parser::Task<It> p;
static const Generate::XML to_xml;
for (std::string_view input :
{
R"(Class Simple caption;
Class Simple columns "Column Name";
Class Container CONTAINER_NAME (
Complex OBJECT_NAME (
Simple obj_id
Simple obj_property1
Simple obj_attribute enumeration(EnumOption1, EnumOption2,EnumOption3,EnumOption4)
Container OBJECT_ITEMS (
Complex OBJECT_ITEM (
Simple obj_item_name
Container set_value (
Simple obj_item_value
)
)
)
)
);)",
}) //
{
try {
Ast::Task t;
if (qi::parse(begin(input), end(input), p, t)) {
pugi::xml_document doc;
to_xml(doc.root(), t);
doc.print(std::cout, " ", pugi::format_default);
std::cout << std::endl;
} else {
std::cout << " -> INVALID" << std::endl;
}
} catch (qi::expectation_failure<It> const& ef) {
auto f = begin(input);
auto p = ef.first - input.begin();
auto bol = input.find_last_of("\r\n", p) + 1;
auto line = std::count(f, f + bol, '\n') + 1;
auto eol = input.find_first_of("\r\n", p);
std::cerr << " -> EXPECTED " << ef.what_ << " in line:" << line << "\n"
<< input.substr(bol, eol - bol) << "\n"
<< std::setw(p - bol) << ""
<< "^--- here" << std::endl;
}
}
}
Printing the coveted output:
<task>
<class>
<simple>
<identifier>caption</identifier>
<literal> </literal>
</simple>
</class>
<class>
<simple>
<identifier>columns</identifier>
<literal>Column Name</literal>
</simple>
</class>
<class>
<container>
<identifier>CONTAINER_NAME</identifier>
<literal> </literal>
<complex>
<identifier>OBJECT_NAME</identifier>
<literal> </literal>
<simple>
<identifier>obj_id</identifier>
<literal> </literal>
</simple>
<simple>
<identifier>obj_property1</identifier>
<literal> </literal>
</simple>
<simple>
<identifier>obj_attribute</identifier>
<literal> </literal>
<enumeration>
<word>EnumOption1</word>
<word>EnumOption2</word>
<word>EnumOption3</word>
<word>EnumOption4</word>
</enumeration>
</simple>
<container>
<identifier>OBJECT_ITEMS</identifier>
<literal> </literal>
<complex>
<identifier>OBJECT_ITEM</identifier>
<literal> </literal>
<simple>
<identifier>obj_item_name</identifier>
<literal> </literal>
</simple>
<container>
<identifier>set_value</identifier>
<literal> </literal>
<simple>
<identifier>obj_item_value</identifier>
<literal> </literal>
</simple>
</container>
</complex>
</container>
</complex>
</container>
</class>
</task>
I still don't unserstand how the CONTAINER_NAME:
"namespacing" works, so I'll leave that to you to get right.