1

I'm trying to write a parser to parse html with boost spirit x3, and I wrote parsers below:

The problem is these code can't compile. Error is :

fatal error C1202: recursive type or function dependency context too complex

I know this error comes out because of my parser html_element_ references tag_block_, and tag_block_ references html_element_, but I don't know how to make it work.

#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/home/x3/support/ast/position_tagged.hpp>
#include <boost/spirit/home/x3/support/ast/variant.hpp>
#include <iostream>
using namespace boost::spirit::x3;
struct tag_name{};
struct html_tag;
struct html_comment;
struct attribute_data : boost::spirit::x3::position_tagged {
  std::string name;
  boost::optional<std::string> value;
};


struct tag_header :  boost::spirit::x3::position_tagged {
  std::string name;
  std::vector<attribute_data> attributes;
};

struct self_tag: boost::spirit::x3::position_tagged {
  tag_header header;
};

struct html_element : boost::spirit::x3::position_tagged, boost::spirit::x3::variant< std::string, self_tag, boost::recursive_wrapper<html_tag>>{
  using base_type::base_type;
  using base_type::operator=;
};



struct html_tag: boost::spirit::x3::position_tagged {
  tag_header header;
  std::vector<html_element> children;
};

BOOST_FUSION_ADAPT_STRUCT(attribute_data, name, value);
BOOST_FUSION_ADAPT_STRUCT(tag_header, name, attributes);
BOOST_FUSION_ADAPT_STRUCT(self_tag, header);
BOOST_FUSION_ADAPT_STRUCT(html_tag,header,children);

// These are the attributes parser, seems fine
struct attribute_parser_id;
auto attribute_identifier_= rule<attribute_parser_id, std::string>{"AttributeIdentifier"} = lexeme[+(char_ - char_(" /=>"))];
auto attribute_value_= rule<attribute_parser_id, std::string>{"AttributeValue"} =
                           lexeme["\"" > +(char_ - char_("\"")) > "\""]|lexeme["'" > +(char_ - char_("'")) > "'"]|
                           lexeme[+(char_ - char_(" />"))];
auto single_attribute_ = rule<attribute_parser_id, attribute_data>{"SingleAttribute"} = attribute_identifier_ > -("=">  attribute_value_);
auto attributes_ = rule<attribute_parser_id, std::vector<attribute_data>>{"Attributes"} = (*single_attribute_);


struct tag_parser_id;


auto tag_name_begin_func = [](auto &ctx){
  get<tag_name>(ctx) = _attr(ctx).name;
  //_val(ctx).header.name = _attr(ctx);
  std::cout << typeid(_val(ctx)).name() << std::endl;

};
auto tag_name_end_func = [](auto &ctx){
  _pass(ctx) = get<tag_name>(ctx) == _attr(ctx);
};

auto self_tag_name_action = [](auto &ctx){
  _val(ctx).header.name = _attr(ctx);
};
auto self_tag_attribute_action = [](auto &ctx){
  _val(ctx).header.attributes = _attr(ctx);
};

auto inner_text = lexeme[+(char_-'<')];
auto tag_name_ = rule<tag_parser_id, std::string>{"HtmlTagName"} = lexeme[*(char_ - char_(" />"))];
auto self_tag_ = rule<tag_parser_id, self_tag>{"HtmlSelfTag"} = '<' > tag_name_[self_tag_name_action] > attributes_[self_tag_attribute_action] > "/>";
auto tag_header_ = rule<tag_parser_id, tag_header>{"HtmlTagBlockHeader"} = '<' > tag_name_ > attributes_ > '>';

rule<tag_parser_id, html_tag> tag_block_;

rule<tag_parser_id, html_element> html_element_ = "HtmlElement";

auto tag_block__def = with<tag_name>(std::string())[tag_header_[tag_name_begin_func] > (*html_element_) > "</" > omit[tag_name_[tag_name_end_func]] > '>'];
auto html_element__def = inner_text | self_tag_ | tag_block_ ;

BOOST_SPIRIT_DEFINE(tag_block_, html_element_);
int main()
{
  std::string source = "<div data-src=\"https://www.google.com\" id='hello world'></div>";
  html_element result;
  auto const parser = html_element_;
  auto parse_result = phrase_parse(source.begin(), source.end(), parser, ascii::space, result);
}



I tried to read the example of boost:spirit:qi in official document and the x3 official document, in example of qi, that parser is only parse tag, but not attributes。 The example in x3 official document is different, I think in my case is harder;

genpfault
  • 51,148
  • 11
  • 85
  • 139
Hackman Lo
  • 11
  • 1

2 Answers2

1

On reading, the first thing I notice is that self_tag_ uses expectation points. That won't fly because it is ordered before other things that can legally start with <, like tag_block_:

auto html_element__def = inner_text | self_tag_ | tag_block_ ;

And due to the expectation points it will never backtrack to reach that.

Many places use operator+ where operator* is required, like:

auto inner_text = lexeme[*(char_-'<')];

All those charset differences can be phrased as inverse sets:

auto inner_text = lexeme[*~char_('<')];
//
    = lexeme[*~char_(" />")];

Aside from the fact that XML has specific valid charsets for e.g. element names, but I'm assuming you expressly want to avoid writing a conformant parser. Specifically you really need to be excluding '<', '>', '\r', '\t' etc. from your attribute name/value rules etc.

One smell is the re-use of parser rule tags. This should, as far as my understanding goes, be fine for immediately-defined rules, but certainly not for those that are defined through their tag type, with BOOST_SPIRIT_DEFINE.

Cleanup Exercism

First, a cleanup. This gets past the hurdle of template instantiation depth by commenting out *html_element_ inside tag_block__def. But first let's see what works then:

Live On Coliru

//#define BOOST_SPIRIT_X3_DEBUG
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/support/ast/variant.hpp>
#include <iomanip>
#include <iostream>

//// Unused mixin disabled for simplicity
// #include <boost/spirit/home/x3/support/ast/position_tagged.hpp>

namespace x3 = boost::spirit::x3;
using namespace std::string_literals;

namespace Ast {
    struct tag_name {};
    struct html_tag;
    struct html_comment;

    // using mixin = x3::position_tagged;
    struct mixin {};

    struct attribute_data : mixin {
        std::string                  name;
        boost::optional<std::string> value;
    };
    using attribute_datas = std::vector<attribute_data>;

    struct tag_header : mixin {
        std::string     name;
        attribute_datas attributes;
    };

    struct self_tag : mixin {
        tag_header header;
    };

    using element_base =
        x3::variant<std::string, self_tag, boost::recursive_wrapper<html_tag>>;

    struct html_element : mixin , element_base {
        using element_base::element_base;
        using element_base::operator=;
    };

    using html_elements = std::vector<html_element>;

    struct html_tag : mixin {
        tag_header    header;
        html_elements children;
    };
} // namespace Ast

BOOST_FUSION_ADAPT_STRUCT(Ast::attribute_data, name, value)
BOOST_FUSION_ADAPT_STRUCT(Ast::tag_header, name, attributes)
BOOST_FUSION_ADAPT_STRUCT(Ast::self_tag, header)
BOOST_FUSION_ADAPT_STRUCT(Ast::html_tag, header, children)

namespace Parser {
    auto attribute_identifier_                                                         //
        = x3::rule<struct AttributeIdentifier_tag, std::string>{"AttributeIdentifier"} //
        = x3::lexeme[+~x3::char_(" /=>")];

    auto attribute_value_                                                    //
        = x3::rule<struct AttributeValue_tag, std::string>{"AttributeValue"} //
    = x3::lexeme                                                             //
        [('"' > *~x3::char_('"') > '"')                                      //
         | ("'" > *~x3::char_("'") > "'")                                    //
         | *~x3::char_(" />")                                                //
    ];

    auto single_attribute_ =
        x3::rule<struct attribute_identifier__tag, Ast::attribute_data>{"SingleAttribute"} //
        = attribute_identifier_ >> -("=" >> attribute_value_);

    auto attributes_                                                              //
        = x3::rule<struct attribute_data_tag, Ast::attribute_datas>{"Attributes"} //
        = *single_attribute_;

    [[maybe_unused]] static auto& header_of(x3::unused_type) {
        thread_local Ast::tag_header s_dummy;
        return s_dummy;
    }
    [[maybe_unused]] static auto& header_of(Ast::html_tag& ht) {
        return ht.header;
    }

    auto tag_name_begin_func = [](auto &ctx){
        get<Ast::tag_name>(ctx) = _attr(ctx).name;
        // header_of(_val(ctx)).name = _attr(ctx);
        // std::cout << typeid(_val(ctx)).name() << std::endl;
    };

    auto tag_name_end_func         = [](auto& ctx){ _pass(ctx) = (get<Ast::tag_name>(ctx) == _attr(ctx)); };
    auto self_tag_name_action      = [](auto &ctx){ header_of(_val(ctx)).name = _attr(ctx); };
    auto self_tag_attribute_action = [](auto& ctx) { header_of(_val(ctx)).attributes = _attr(ctx); };

    auto tag_name_                                                     //
        = x3::rule<struct HtmlTagName_tag, std::string>{"HtmlTagName"} //
        = x3::lexeme[*~x3::char_(" />")];

    auto self_tag_                                                       //
        = x3::rule<struct HtmlSelfTag_tag, Ast::self_tag>{"HtmlSelfTag"} //
        = '<' >> tag_name_[self_tag_name_action] >> attributes_[self_tag_attribute_action] >> "/>";

    auto tag_header_                                                                     //
        = x3::rule<struct HtmlTagBlockHeader_tag, Ast::tag_header>{"HtmlTagBlockHeader"} //
        = '<' >> tag_name_ >> attributes_ >> '>';

    x3::rule<struct tag_block__tag, Ast::html_tag>        tag_block_    = "TagBlock";
    x3::rule<struct html_element__tag, Ast::html_element> html_element_ = "HtmlElement";

    auto tag_block__def = x3::with<Ast::tag_name>(""s)                        //
        [                                                                     //
            tag_header_[tag_name_begin_func] >> /**html_element_ >>*/ "</" >> //
            x3::omit[tag_name_[tag_name_end_func]] >> '>'                     //
        ];

    auto inner_text        = x3::lexeme[*~x3::char_('<')];
    auto html_element__def = inner_text | self_tag_ | tag_block_;

    BOOST_SPIRIT_DEFINE(tag_block_, html_element_)
}

namespace unit_tests {
    template <bool ShouldSucceed = true, typename P>
    void test(P const& rule, std::initializer_list<std::string_view> cases) {
        for (auto input : cases) {
            if constexpr (ShouldSucceed) {
                typename x3::traits::attribute_of<P, x3::unused_type>::type result;

                auto ok = phrase_parse(input.begin(), input.end(), rule, x3::space, result);
                std::cout << quoted(input) << " -> " << (ok ? "Ok" : "FAILED") << std::endl;
            } else {
                auto ok = phrase_parse(input.begin(), input.end(), rule, x3::space);
                if (!ok)
                    std::cout << "Fails as expected: " << quoted(input) << std::endl;
                else
                    std::cout << "SHOULD HAVE FAILED: " << quoted(input) << std::endl;
            }
        }
    }
}

int main() {
    unit_tests::test(Parser::self_tag_,
                     {
                         R"(<simple foo="" bar='' value-less qux=bareword/>)",
                         R"(<div />)",
                         R"(<div/>)",
                         R"(< div/>)",
                     });

    unit_tests::test(Parser::html_element_,
                     {
                         R"(<simple foo="" bar='' value-less qux=bareword></simple>)",
                         R"(<div ></div>)",
                         R"(<div></div>)",
                         R"(< div></div>)",
                         R"(< div ></div>)",
                         R"(<div data-src="https://www.google.com" id='hello world'></div>)",

                         R"(<div></ div>)",
                         R"(<div></ div >)",
                     });

    unit_tests::test<false>(Parser::self_tag_,
                            {
                                R"(<div/ >)",
                                R"(<div>< /div>)",
                                R"(<div></dov>)",
                            });
}

Outputs

"<simple foo=\"\" bar='' value-less qux=bareword/>" -> Ok   
"<div />" -> Ok
"<div/>" -> Ok
"< div/>" -> Ok
"<simple foo=\"\" bar='' value-less qux=bareword></simple>" -> Ok
"<div ></div>" -> Ok
"<div></div>" -> Ok
"< div></div>" -> Ok
"< div ></div>" -> Ok
"<div data-src=\"https://www.google.com\" id='hello world'></div>" -> Ok
"<div></ div>" -> Ok
"<div></ div >" -> Ok
Fails as expected: "<div/ >"
Fails as expected: "<div>< /div>"
Fails as expected: "<div></dov>"

What Is The Trouble

As you can deduce from my hunch to comment-out the recursion *html_element_, this is causing problems.

The real reason is that with<> extends the context. This means that each level of recursion adds more data to the context type, causing new template instantiations.

The simplest trick is to move with<> up outside the recursion:

auto tag_block__def =                                             //
    tag_header_[tag_name_begin_func] >> *html_element_ >> "</" >> //
    x3::omit[tag_name_[tag_name_end_func]] >> '>'                 //
    ;

auto inner_text        = x3::lexeme[*~x3::char_('<')];
auto html_element__def = inner_text | self_tag_ | tag_block_;
auto start             = x3::with<Ast::tag_name>(""s)[html_element_];

However this highlights the problem that elements can nest, and it's useless when inner tags overwrite the context data for tag_name. So, instead of string we could make it stack<string>:

auto start = x3::with<tag_stack>(std::stack<std::string>{})[html_element_];

And then amend the actions to match:

auto tag_name_begin_func = [](auto& ctx) { get<tag_stack>(ctx).push(_attr(ctx).name); };

auto tag_name_end_func = [](auto& ctx) {
    auto& s    = get<tag_stack>(ctx);
    _pass(ctx) = (s.top() == _attr(ctx));
    s.pop();
};

See it Live On Coliru

//#define BOOST_SPIRIT_X3_DEBUG
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/support/ast/variant.hpp>
#include <iomanip>
#include <iostream>
#include <stack>

//// Unused mixin disabled for simplicity
// #include <boost/spirit/home/x3/support/ast/position_tagged.hpp>

namespace x3 = boost::spirit::x3;
using namespace std::string_literals;

namespace Ast {
    struct html_tag;
    struct html_comment;

    // using mixin = x3::position_tagged;
    struct mixin {};

    struct attribute_data : mixin {
        std::string                  name;
        boost::optional<std::string> value;
    };
    using attribute_datas = std::vector<attribute_data>;

    struct tag_header : mixin {
        std::string     name;
        attribute_datas attributes;
    };

    struct self_tag : mixin {
        tag_header header;
    };

    using element_base =
        x3::variant<std::string, self_tag, boost::recursive_wrapper<html_tag>>;

    struct html_element : mixin , element_base {
        using element_base::element_base;
        using element_base::operator=;
    };

    using html_elements = std::vector<html_element>;

    struct html_tag : mixin {
        tag_header    header;
        html_elements children;
    };
} // namespace Ast

BOOST_FUSION_ADAPT_STRUCT(Ast::attribute_data, name, value)
BOOST_FUSION_ADAPT_STRUCT(Ast::tag_header, name, attributes)
BOOST_FUSION_ADAPT_STRUCT(Ast::self_tag, header)
BOOST_FUSION_ADAPT_STRUCT(Ast::html_tag, header, children)

namespace Parser {
    struct tag_stack final {};

    auto attribute_identifier_                                                         //
        = x3::rule<struct AttributeIdentifier_tag, std::string>{"AttributeIdentifier"} //
        = x3::lexeme[+~x3::char_(" /=>")];

    auto attribute_value_                                                    //
        = x3::rule<struct AttributeValue_tag, std::string>{"AttributeValue"} //
    = x3::lexeme                                                             //
        [('"' > *~x3::char_('"') > '"')                                      //
         | ("'" > *~x3::char_("'") > "'")                                    //
         | *~x3::char_(" />")                                                //
    ];

    auto single_attribute_ =
        x3::rule<struct attribute_identifier__tag, Ast::attribute_data>{"SingleAttribute"} //
        = attribute_identifier_ >> -("=" >> attribute_value_);

    auto attributes_                                                              //
        = x3::rule<struct attribute_data_tag, Ast::attribute_datas>{"Attributes"} //
        = *single_attribute_;

    [[maybe_unused]] static auto& header_of(x3::unused_type) {
        thread_local Ast::tag_header s_dummy;
        return s_dummy;
    }
    [[maybe_unused]] static auto& header_of(Ast::html_tag& ht) {
        return ht.header;
    }

    auto tag_name_begin_func = [](auto& ctx) { get<tag_stack>(ctx).push(_attr(ctx).name); };

    auto tag_name_end_func = [](auto& ctx) {
        auto& s    = get<tag_stack>(ctx);
        _pass(ctx) = (s.top() == _attr(ctx));
        s.pop();
    };
    auto assign_name  = [](auto& ctx) { header_of(_val(ctx)).name = _attr(ctx); };
    auto assign_attrs = [](auto& ctx) { header_of(_val(ctx)).attributes = _attr(ctx); };
    auto tag_name_                                                     //
        = x3::rule<struct HtmlTagName_tag, std::string>{"HtmlTagName"} //
        = x3::lexeme[*~x3::char_(" />")];

    auto self_tag_                                                       //
        = x3::rule<struct HtmlSelfTag_tag, Ast::self_tag>{"HtmlSelfTag"} //
        = '<' >> tag_name_[assign_name] >> attributes_[assign_attrs] >> "/>";

    auto tag_header_                                                                     //
        = x3::rule<struct HtmlTagBlockHeader_tag, Ast::tag_header>{"HtmlTagBlockHeader"} //
        = '<' >> tag_name_ >> attributes_ >> '>';

    x3::rule<struct tag_block__tag, Ast::html_tag>        tag_block_    = "TagBlock";
    x3::rule<struct html_element__tag, Ast::html_element> html_element_ = "HtmlElement";

    auto tag_block__def =                                             //
        tag_header_[tag_name_begin_func] >> *html_element_ >> "</" >> //
        x3::omit[tag_name_[tag_name_end_func]] >> '>'                 //
        ;

    auto inner_text        = x3::lexeme[*~x3::char_('<')];
    auto html_element__def = inner_text | self_tag_ | tag_block_;
    auto start             = x3::with<tag_stack>(std::stack<std::string>{})[html_element_];

    BOOST_SPIRIT_DEFINE(tag_block_, html_element_)
}

namespace unit_tests {
    template <bool ShouldSucceed = true, typename P>
    void test(P const& rule, std::initializer_list<std::string_view> cases) {
        for (auto input : cases) {
            if constexpr (ShouldSucceed) {
                typename x3::traits::attribute_of<P, x3::unused_type>::type result;

                auto ok = phrase_parse(input.begin(), input.end(), rule, x3::space, result);
                std::cout << quoted(input) << " -> " << (ok ? "Ok" : "FAILED") << std::endl;
            } else {
                auto ok = phrase_parse(input.begin(), input.end(), rule, x3::space);
                if (!ok)
                    std::cout << "Fails as expected: " << quoted(input) << std::endl;
                else
                    std::cout << "SHOULD HAVE FAILED: " << quoted(input) << std::endl;
            }
        }
    }
}

int main() {
    unit_tests::test(Parser::self_tag_,
                     {
                         R"(<simple foo="" bar='' value-less qux=bareword/>)",
                         R"(<div />)",
                         R"(<div/>)",
                         R"(< div/>)",
                     });

    unit_tests::test(Parser::start,
                     {
                         R"(<simple foo="" bar='' value-less qux=bareword></simple>)",
                         R"(<div ></div>)",
                         R"(<div></div>)",
                         R"(< div></div>)",
                         R"(< div ></div>)",
                         R"(<div data-src="https://www.google.com" id='hello world'></div>)",

                         R"(<div></ div>)",
                         R"(<div></ div >)",

                         R"(<div><nest/><nest some="more">yay</nest></div>)",
                     });

    unit_tests::test<false>(Parser::self_tag_,
                            {
                                R"(<div/ >)",
                                R"(<div>< /div>)",
                                R"(<div></dov>)",
                            });
}

Printing

"<simple foo=\"\" bar='' value-less qux=bareword/>" -> Ok
"<div />" -> Ok
"<div/>" -> Ok
"< div/>" -> Ok
"<simple foo=\"\" bar='' value-less qux=bareword></simple>" -> Ok
"<div ></div>" -> Ok
"<div></div>" -> Ok
"< div></div>" -> Ok
"< div ></div>" -> Ok
"<div data-src=\"https://www.google.com\" id='hello world'></div>" -> Ok
"<div></ div>" -> Ok
"<div></ div >" -> Ok
"<div><nest/><nest some=\"more\">yay</nest></div>" -> Ok
Fails as expected: "<div/ >"
Fails as expected: "<div>< /div>"
Fails as expected: "<div></dov>"

CLOSING THOUGHTS

I'm answering this assuming you are just doing this to learn X3. Otherwise the only recommendation is: do not do this. Use a library.

Not only does your grammar do a pretty poor job of parsing XML, it will utterly fail on HTML in the wild. Closing tags are not a given in HTML ("quirks mode"). Scripts, CDATA, entity references, Unicode, escapes will all f*ck your parser up.

Oh, have you noticed how you mostly broke attribute propagation by introducing some semantic actions? I could show you how to fix it, but I think I'd rather leave it for the moment.

Just use a library.

sehe
  • 374,641
  • 47
  • 450
  • 633
  • Ty for the answer,I learned a lot.I may need some time to fully understand your answer. I'm not really familiar to spirit – Hackman Lo Dec 31 '22 at 13:05
  • Let me know if you have questions. This is a Q&A site after all :) – sehe Dec 31 '22 at 13:17
  • Ty again for the detailed description. I never thought the problem is the "with" part of the parser. And I do noticed that semantic actions would break attribute propagition, my prev action is assign attribute in semantic actions. You metioned you have better solution for that? – Hackman Lo Jan 04 '23 at 05:14
  • In general yes, I [avoid semantic actions](https://stackoverflow.com/questions/8259440/boost-spirit-semantic-actions-are-evil). So you have to figure out carefully whether/why you cannot use automatic propagation. I'm pretty suspicious of the AST, as it is. Haven't given it much thought but to me it doesn't express binary/unary operator relations in the classical sense which gives me concern that it will be hard to correctly interpret/manipulate the AST in terms of precedence rules and associativity. – sehe Jan 04 '23 at 12:38
  • I mentioned "I could show you how to fix it" - not suggesting that my solution is _better_ - just tediously getting things correct. And I would definitely first try to reshape the AST to better reflect the domain. – sehe Jan 04 '23 at 12:39
  • Hmm. I was definitely mixing up questions :) I do think that the AST doesn't feel like a natural fit (with the artificial "header" group), but that should really aid automatic attribute propagation here. In terms of domain congruence I'd prefer to parse an element ("tags" aren't really a thing at the markup level) with a single production, whether they are self-closing, empty, parent or mixed. On a tactical level your AST might be the most friction-less approach as it avoids a-symmetric push/pop scenarios. Still not convinced it is worth putting effort in doing this instead of using a library – sehe Jan 04 '23 at 13:11
  • I know use a library is simpler.But actually I'm not really want to parse a XML or HTML. As I mentioned before,I'm trying to learn spirit, XML or HTML is a single try. It's may be little harder than examples. This code is a stitch monster. Some parts are from examples, some parts are my extends from examples. So the code looks ugly. But they do let me know more about spirit. You know, the best way to learning something is not to read a book,but to resolve a exact problem. – Hackman Lo Jan 05 '23 at 00:43
  • You didn't actually mention that before, even though I've explicitly asked :) No worries, thanks for clarifying. I completely agree with the learning approach. My approach has been to solve actual problems posted on SO :) – sehe Jan 05 '23 at 01:58
  • Oh?I didn't mentioned that?OOPS, maybe I just thought I montioned LOL. – Hackman Lo Jan 05 '23 at 06:17
0

This initial solution to the problem of, among other things, matching begin/end tags, is greatly simplified here The simplification solely focuses on the "matching begin/end tags" subpart of the problem. The simplification makes no attempt at parsing strings, instead it simply parses x3:uint_. This is sufficient to illustrate a solution to the subpart of the problem because the essence of the subpart problem is matching begin tags with end tags. More specifically, the problem of inferring that the attribute of this expression:

      auto 
    tag_header_
      = 
      (  '<' 
      >> tag_name_
      >> '>'
      )
      #ifdef USE_SEMANTIC_ACTIONS
      [tag_name_begin_func]
      #endif
      ;

is the same as the attribute of this expression:

      auto 
    tag_footer_
      = 
      (  "</"
      >> tag_name_ 
      >> '>'
      )
      #ifdef USE_SEMANTIC_ACTIONS
      [tag_name_end_func]
      #endif
      ;

is much visually simpler than inferring that the attribute of this expression:

    auto tag_name_                                                     //
        = x3::rule<struct HtmlTagName_tag, std::string>{"HtmlTagName"} //
        = x3::lexeme[*~x3::char_(" />")];

is the same as the attribute of this expression:

        "</" >> //
        x3::omit[tag_name_[tag_name_end_func]] >> '>'                 //  

The latter 2, visually complicated, expressions were copy&pasted from here.

Furthermore, tag_name_ and inner_text are also much simpler. The original:

   auto tag_name_                                                     //
       = x3::rule<struct HtmlTagName_tag, std::string>{"HtmlTagName"} //
       = x3::lexeme[*~x3::char_(" />")];
   auto inner_text        = x3::lexeme[*~x3::char_('<')];

is obviously and distractedly more complicated than the simplified solution:

    auto tag_name_
        = x3::uint_;
    auto inner_text        = x3::uint_;

Now, the reader may note, that the original solution contained several statements which Seth called "immediately-defined rules". An "immediately-define rule" pattern maybe "abstracted" as:

    auto RuleDef
      = x3::rule<struct RuleTag, RuleAttribute>{"RuleName"}
      = RuleRhs;

in this abstraction the camel case identifiers are pattern parameters which are replaced to create an actual instance of an immediately-defined rule, somewhat like when template's expressions are instantiated. In the above tag_name_ instance, the following replacements were made:

  RuleDef -> tag_name_
  RuleTag -> HtmlTagName_tag
  RuleAttribute -> std::string
  RuleName -> HtmlTagName
  RuleRhs -> x3::lexeme[*~x3::char_(" />")] 

But, what's the purpose of an immediately-defined rule? Well, one reason is for converting attribute of the RuleRhs to the RuleAttribute, as shown here. (The example may be a bit hard to understand because the immediately-defined rule is obscured by being within the expression forming the parser argument to the parse function.)

However, there's no need for such conversions in the simplification; hence, all the immediately-defined rules were removed as a further simplification.

  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jan 09 '23 at 13:07