2

For the development of Spirit X3 parser I want to use semantic actions(footnote 1). It is important for me to be in control of how to store attributes into STL containers.

This question is about how to control that the parser attribute: _attr( ctx ) match the rule type: _val( ctx ) so that it can be assigned properly. Maybe this question boils down to how to apply the undocumented transform_attribute feature. But please read with me to see if that is actually the thing that solves it for me in the example code.

Printing types of objects/variables

What I found very useful is the ability to print the type of _attr( ctx ) and _val( ctx ) in an semantic action, when I am experimenting with different grammar expressions.

So based on the answer of Howard Hinnant, I wrote a utility header file to provide facilities like this according to my preferences.

code below is to be put in a file named utility.h

#include <string>
#include <type_traits>
#include <typeinfo>
#include <cxxabi.h>

namespace utility
{

template<typename T>
std::string type2string()
{
  std::string r;
  typedef typename std::remove_reference<T>::type TR;

  std::string space = "";
  if ( std::is_const<TR>::value )
    { r = "const"; space = " "; }
  if ( std::is_volatile<TR>::value )
    { r += space + " volatile"; space = " "; }

  int status;
  char* demangled =
    abi::__cxa_demangle( typeid(TR).name(), nullptr, nullptr, &status );
  switch ( status )
  {
    case  0: { goto proceed; }
    case -1: { r = "type2string failed: malloc failure"; goto fail; }
    case -2: { r = "type2string failed: " + std::string(typeid(TR).name()) +
      " nonvalid C++ ABI name"; goto fail; }
    case -3: { r = "type2string failed: invalid argument(s)"; goto fail; }
    default: { r = "type2string failed: unknown status " +
      status; goto fail; }
  }
  proceed:
  r += space + demangled;
  free( demangled );

  /* references are without a space */
  if ( std::is_lvalue_reference<T>::value ) { r += '&'; }
  if ( std::is_rvalue_reference<T>::value ) { r += "&&"; }

  fail:
  return r;
}

}

Now the actual working example code:

#include <cstddef>
#include <cstdio>
#include <cstdint>

#define BOOST_SPIRIT_X3_DEBUG
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/home/x3.hpp>

#include <string>
#include <vector>
#include <utility> // this is for std::move
#include "utility.h" // to print types

namespace client
{
  namespace x3 = boost::spirit::x3;
  namespace ascii = boost::spirit::x3::ascii;

  namespace semantic_actions
  {
    using x3::_val;  // assign to _val( ctx )
    using x3::_attr; // from _attr( ctx )    

    struct move_assign
    {  
      template <typename Context>
      void operator()(const Context& ctx) const
      {
        printf( "move_assign\n" );
        _val( ctx ) = std::move( _attr( ctx ) );
      }
    };

    struct print_type
    {
      template <typename Context>
      void operator()(const Context& ctx) const
      {
        printf( "print_type\n" );

        std::string str;
        str = utility::type2string< decltype( _attr( ctx ) ) >();
        printf( "_attr type: %s\n", str.c_str() );

        // reuse str
        str = utility::type2string< decltype( _val( ctx ) ) >();
        printf( "_val type: %s\n", str.c_str() );
      }
    };
  }

  namespace parser
  {
    using x3::char_;
    using x3::lit;
    using namespace semantic_actions;

    x3::rule<struct main_rule_class, std::string> main_rule_ = "main_rule";

    const auto main_rule__def = (*( !lit(';') >> char_) >> lit(';'))[print_type()][move_assign()];

    BOOST_SPIRIT_DEFINE( main_rule_ )

    const auto entry_point = x3::skip(x3::space)[ main_rule_ ];
  }
}

int main()
{
  printf( "Give me a string to test rule.\n" );
  printf( "Type [q or Q] to quit.\n" );

  std::string input_str;
  std::string output_str;

  while (getline(std::cin, input_str))
  {
    if ( input_str.empty() || input_str[0] == 'q' || input_str[0] == 'Q')
    { break; }

    auto first = input_str.begin(), last = input_str.end();

    if ( parse( first, last, client::parser::entry_point, output_str) )
    {
      printf( "Parsing succeeded\n" );
      printf( "input:  \"%s\"\n", input_str.c_str() );
      printf( "output: \"%s\"\n", output_str.c_str() );
    }
    else
    {
      printf( "Parsing failed\n" );
    }
  }

  return 0;
}

The input is always: abcd;

output:

Give me a string to test rule.
Type [q or Q] to quit.
<main_rule>
  <try>abcd;</try>
print_type
_attr type: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&
_val type: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&
move_assign
  <success></success>
  <attributes>[a, b, c, d]</attributes>
</main_rule>
Parsing succeeded
input:  "abcd;"
output: "abcd"

Ok, so far all fine but assume I would like to include the semicolon in the parsed result. I change the grammar line to:

const auto main_rule__def = (*( !lit(';') >> char_) >> char_(";"))[print_type()];

Note: I removed the semantic action [move_assign()] because it fails to compile due to incompatible _attr and _val types. Now the output is:

Give me a string to test rule.
Type [q or Q] to quit.
<main_rule>
  <try>abcd;</try>
print_type
_attr type: boost::fusion::deque<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, char>&
_val type: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&
  <success></success>
  <attributes>[]</attributes>
</main_rule>
Parsing succeeded
input:  "abcd;"
output: ""

Now the _attr type of boost::fusion::deque<> is not what I want, I just what it to be std::string. I don’t understand why if I have the complete right side of the grammar assignment within semantic action parentheses _attr is still not of the _val type. Would the X3 feature transform_attribute help here? And how should I apply that? Or what is another good way to solve this, without having to work with boost fusion class interfaces or other implementation details.

Current workaround

The current workaround for me is to define another rule just to be assigned from the first rule with a semantic action. Only there the _attr is of std::string type.

  namespace parser
  {
    using x3::char_;
    using x3::lit;
    using namespace semantic_actions;

    x3::rule<struct main_rule_class, std::string> main_rule_ = "main_rule";
    x3::rule<struct main_rule2_class, std::string> main_rule2_ = "main_rule2";

    const auto main_rule__def = *( !lit(';') >> char_) >> char_(";");
    const auto main_rule2__def = main_rule_[print_type()][move_assign()];

    BOOST_SPIRIT_DEFINE( main_rule_, main_rule2_ )

    const auto entry_point = x3::skip(x3::space)[ main_rule2_ ];
  }

output:

Give me a string to test rule.
Type [q or Q] to quit.
<main_rule2>
  <try>abcd;</try>
  <main_rule>
    <try>abcd;</try>
    <success></success>
    <attributes>[a, b, c, d, ;]</attributes>
  </main_rule>
print_type
_attr type: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&
_val type: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&
move_assign
  <success></success>
  <attributes>[a, b, c, d, ;]</attributes>
</main_rule2>
Parsing succeeded
input:  "abcd;"
output: "abcd;"

I hope there is a way without having to make another rule just to get the type of _attr to match _val.

(1) I don’t appreciate the hidden cleverness the authors put into this library. As just one innocent looking change can break the application. Whereas a more explicit and elaborate approach will communicate much clearer what is going on. I just have to get this off my chest.

sehe
  • 374,641
  • 47
  • 450
  • 633
Zeyneb
  • 115
  • 1
  • 8
  • oh yeah, it's Spirit v3.0.3, included in boost 1.69.0. Compiled with mingw-w64 GCC 8.1 in C++14 mode. – Zeyneb Jun 29 '19 at 18:03
  • About the "hidden cleverness" - this is actually the selling point. It's like any other framework with conventions/heuristics: there's a sweet spot when you know how to navigate the edges. Otherwise, I'd consider hand-rolling a parser or using a proper code-generator like ANTLR, CoCo/C++ etc. – sehe Jun 30 '19 at 11:00

2 Answers2

1

With char_(';'), the attribute has 2 parts. Both parts need to be added to _val. Something like:

namespace semantic_actions
{
  using x3::_val;  // assign to _val( ctx )
  using x3::_attr; // from _attr( ctx )    
  using boost::fusion::at_c;

  struct move_assign
  {  
    template <typename Context>
    void operator()(const Context& ctx) const
    {
      printf( "move_assign\n" );
      auto attr=_attr( ctx );
      _val( ctx ) = at_c<0>( attr );
      _val( ctx ) += at_c<1>( attr );       
    }
  };
.
.
.
}
user1681377
  • 93
  • 1
  • 8
  • I wouldn't use a parser generator if it required clumsy code like this. It works, though, so props for that. – sehe Jun 30 '19 at 10:58
  • Thanks for your suggestion to help me out. I knew it would have been possible to use the fusion class interface, but that was something I wanted to avoid. I think it wouldn't surprise you that I go with sehe's advice. – Zeyneb Jun 30 '19 at 16:01
  • @sehe, there's no need for the `as` alias template, at least according to results [here](https://coliru.stacked-crooked.com/a/fb47058fb6837d0d). – user1681377 Feb 27 '23 at 16:07
  • @user1681377 possible. There have been many improvements and regressions over time. Which is fine, as [X3 was (is?) still considered experimental](https://www.boost.org/doc/libs/1_81_0/libs/spirit/doc/html/spirit/what_s_new.html) - at least during most of that time. However this answer addresses the *explicit* question of how to coerce the attribute type. This is - in many cases - often required to get grammars to compile - and more importantly, to reliable synthesize the expected attribute values - in the absense of x3::rule instances. – sehe Feb 27 '23 at 17:31
1

Direct Answer

transform_attribute is not yet documented for X3 (https://www.boost.org/doc/libs/1_70_0/libs/spirit/doc/x3/html/index.html) but you can find its Qi counterpart here: https://www.boost.org/doc/libs/1_70_0/libs/spirit/doc/html/spirit/advanced/customize/transform.html.

Would the X3 feature transform_attribute help here? And how should I apply that?

Regardless, it's an implementation detail that you can easily access by using rules. I like to use anonymous rules to help with this:

template <typename T>
    struct as_type {
        template <typename E>
        constexpr auto operator[](E e) const { return x3::rule<struct _, T> {} = e; }
    };

template <typename T>
    static inline constexpr as_type<T> as;

Now you can write

const auto main_rule__def = as<std::string> [ (*(char_ - ';') >> char_(';')) ];

Live On Coliru

#include <iostream>
//#define BOOST_SPIRIT_X3_DEBUG
#include <boost/spirit/home/x3.hpp>
#include <iomanip> // std::quoted

namespace client {
    namespace x3 = boost::spirit::x3;
    namespace ascii = boost::spirit::x3::ascii;

    namespace parser {
        using x3::char_;
        using x3::lit;

        x3::rule<struct main_rule_class, std::string> main_rule_ = "main_rule";

        template <typename T>
            struct as_type {
                template <typename E>
                constexpr auto operator[](E e) const { return x3::rule<struct _, T> {} = e; }
            };

        template <typename T>
            static inline constexpr as_type<T> as;

        const auto main_rule__def = as<std::string> [ (*(char_ - ';') >> char_(';')) ];

        BOOST_SPIRIT_DEFINE(main_rule_)

        const auto entry_point = x3::skip(x3::space)[main_rule_];
    } // namespace parser
} // namespace client

int main() {
    std::string output_str;
    for(std::string const input_str : { "abcd;" }) {
        auto first = input_str.begin(), last = input_str.end();

        if (parse(first, last, client::parser::entry_point, output_str)) {
            std::cout << "Parsing succeeded\n";
            std::cout << "input:  " << std::quoted(input_str) << "\n";
            std::cout << "output:  " << std::quoted(output_str) << "\n";
        } else {
            std::cout << "Parsing failed\n";
        }
    }
}

Prints

Parsing succeeded
input:  "abcd;"
output:  "abcd;"

In theory there might be performance overhead, but I strongly suspect all compilers will inline everything here since nothing has external linkage or vtables, and everything is const/constexpr.

Alternatives, simplifications:

Use x3::raw

In this case you could have gotten the behaviour you want using an existing directive: x3::raw

Live On Coliru

const auto main_rule__def = x3::raw [ *(char_ - ';') >> ';' ];

Don't use rule<> always

Only required if you have recursive rules or need external linkage on rules (define them in separate translation units). The whole program shrinks to ...

Live On Coliru

#include <iostream>
#include <boost/spirit/home/x3.hpp>
#include <iomanip> // std::quoted

namespace x3 = boost::spirit::x3;
namespace client::parser {
    auto const entry_point = x3::raw [ *(x3::char_ - ';') >> ';' ];
}

int main() {
    for(std::string const input : { "abcd;" }) {
        std::string output;
        if (parse(input.begin(), input.end(), client::parser::entry_point, output)) {
            std::cout << "Parsing succeeded\n";
            std::cout << "input:  " << std::quoted(input) << "\n";
            std::cout << "output: " << std::quoted(output) << "\n";
        } else {
            std::cout << "Parsing failed\n";
        }
    }
}

Finally - About skipping

I don't think you want char_ - ';' (or the more elaborate way you spelled it: !lit(';') >> char_). With the skipper it will parse across whitespace ("ab c\nd ;" -> "abcd;"`).

You would probably want to make the rule more restrictive (like lexeme [+(graph - ';')] or even simply raw[lexeme[+(alnum|'_')] or lexeme[+char_("a-zA-Z0-9_")]).

See Boost spirit skipper issues

sehe
  • 374,641
  • 47
  • 450
  • 633
  • Thanks again sehe for this excellent answer! As you know my style is a bit more verbose as yours but particularly in grammar expressions it's important to eliminate boilerplate like the extra rules I had to define. Your as facility is great to use for this. I tested it and it works for me too. – Zeyneb Jun 30 '19 at 16:06
  • For debugging the anonymous rule name was way too long. So I specified an empty string like this: return x3::rule {""} = e; – Zeyneb Jun 30 '19 at 16:43
  • About using x3::raw, you force spirit to work just with iterators no boost fusion, right? I remember the author mentioning once on the spirit mailing list that no constructions taking place with raw. But apparently without semantic actions this parser is still able to put the result in a std::string. Does this mean spirit is calling the basic_string( InputIt first, InputIt last ) constructor somewhere? How does this work for other types? Maybe I can put x3::raw on specific grammar fragments. But sometimes I just want to get the actual types in semantic actions. – Zeyneb Jun 30 '19 at 17:04
  • It does indeed return an iterator range. However, the iterator range is _just the synthesized attribute_ type now, and propagation still conforms to all the attribute propagation rules, including Boost Fusion (which you can actually tell, because otherwise it wouldn't propagate into `std::string`). – sehe Jun 30 '19 at 19:25
  • "But sometimes I just want to get the actual types" - `std::string` is "the actual type" (actually, in the AST). If you mean "actually the synthesized type", I venture that's rarely truly the case, but you can: http://coliru.stacked-crooked.com/a/2cd2ddc904d7a6c7 or even worse: http://coliru.stacked-crooked.com/a/4d714f2156304206 – sehe Jun 30 '19 at 19:33
  • Regarding "how does this work for other types" - consider asking a question when you run into an issue – sehe Jun 30 '19 at 19:34