If you're parsing C++ code this way you may be biting off more than you can chew.
I'll answer, but the answer should show you how limited this approach is going to be. Just imagine parsing through
namespace q::x {
namespace y {
struct A {
template <typename = ns1::c<int>, typename...> struct C;
};
template <typename T, typename... Ts>
struct A::C final : ns2::ns3::base<A::C<T, Ts...>, Ts...> {
int simple = [](...) {
enum class X : unsigned { answer = 42, };
struct {
auto operator()(...) -> decltype(auto) {
return static_cast<int>(X::answer);
}
} iife;
return iife();
}("/* }}} */"); // {{{
};
}
}
and getting it right. And yes. that's valid code.
In fact it's so tricky, that it's easy to make "grown compilers" (GCC) trip: https://wandbox.org/permlink/FzcaSl6tbn18jq4f (Clang has no issue: https://wandbox.org/permlink/wu0mFwQiTOogKB5L).
That said, let me refer to my old explanation of how rule declarations and skippers work together: Boost spirit skipper issues
And show an approximation of what I'd do.
The comments
Actually, the comments should be part of your skipper, so let's make it so:
using SkipRule = qi::rule<It>;
SkipRule comment_only
= "//" >> *~qi::char_("\r\n") >> qi::eol
| "/*" >> *(qi::char_ - "*/") >> "*/"
;
Now for general skipping, we want to include whitespace:
SkipRule comment_or_ws
= qi::space | comment_only;
Now we want to parse types and identifiers:
qi::rule<It, std::string()> type
= ( qi::string("struct")
| qi::string("class")
| qi::string("union")
| qi::string("enum") >> -(*comment_or_ws >> qi::string("class"))
| qi::string("namespace")
)
>> !qi::graph // must be followed by whitespace
;
qi::rule<It, std::string()> identifier =
qi::char_("a-zA-Z_") >> *qi::char_("a-zA-Z_0-9")
;
I've /guessed/ that struct X { };
would be an example of a "scope" for you, and the tuple would contain ("struct", "X")
.
As a bonus I used attribute adaption of std::pair
and show how to insert into a multimap
for good measure later on
qi::rule<It, std::pair<std::string, std::string>()> scope
= qi::skip(comment_or_ws.alias()) [
type >> identifier
>> *~qi::char_(";{") // ignore some stuff like base classes
>> qi::omit["{" >> *~qi::char_("}") >> "}" | ';']
];
Note a big short-coming here is that the first non-commented '}'
will "end" the scope. That's not how the language works (see the leading example)
Now we can conclude with an improved "block" rule:
qi::rule<It, SkipRule> block
= *(
scope [px::insert(px::ref(scopes), _1)]
| qi::skip(comment_only.alias()) [
qi::as_string[qi::raw[+(qi::char_ - scope)]] [px::ref(rest_content) += _1]
] // rest
);
Note that
- we override the comment_or_ws
skipper with comment_only
so we don't drop all whitespace from "rest content"
- inversely, we override the skipper to include whitespace inside the scope
rule because otherwise the negative scope
invocation (char_ - scope
) would do the wrong thing because it wouldn't skip whitespace
Full Demo
Live On Coliru
//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/std_pair.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace qi = boost::spirit::qi;
namespace px = boost::phoenix;
int main() {
using It = std::string::const_iterator;
using namespace qi::labels;
std::string rest_content;
std::multimap<std::string, std::string> scopes;
using SkipRule = qi::rule<It>;
SkipRule comment_only
= "//" >> *~qi::char_("\r\n") >> qi::eol
| "/*" >> *(qi::char_ - "*/") >> "*/"
;
SkipRule comment_or_ws
= qi::space | comment_only;
qi::rule<It, std::string()> type
= ( qi::string("struct")
| qi::string("class")
| qi::string("union")
| qi::string("enum") >> -(*comment_or_ws >> qi::string("class"))
| qi::string("namespace")
)
>> !qi::graph // must be followed by whitespace
;
qi::rule<It, std::string()> identifier =
qi::char_("a-zA-Z_") >> *qi::char_("a-zA-Z_0-9")
;
qi::rule<It, std::pair<std::string, std::string>()> scope
= qi::skip(comment_or_ws.alias()) [
type >> identifier
>> *~qi::char_(";{") // ignore some stuff like base classes
>> qi::omit["{" >> *~qi::char_("}") >> "}" | ';']
];
qi::rule<It, SkipRule> block
= *(
scope [px::insert(px::ref(scopes), _1)]
| qi::skip(comment_only.alias()) [
qi::as_string[qi::raw[+(qi::char_ - scope)]] [px::ref(rest_content) += _1]
] // rest
);
//BOOST_SPIRIT_DEBUG_NODES((block)(scope)(identifier)(type))
std::string const code = R"(
// some random sample "code"
struct base {
std::vector<int> ints;
};
/* class skipped_comment : base { };
*/
namespace q { namespace nested { } } // nested is not supported
class forward_declared;
template <typename T> // actually basically ignored
class
Derived
: base {
std::string more_data_members;
};
enum class MyEnum : int32_t {
foo = 0,
bar, /* whoop } */
qux = foo + bar
};
int main() {
return 0;
}
)";
qi::phrase_parse(begin(code), end(code), block, comment_or_ws);
for (auto& [k,v] : scopes) {
std::cout << k << ": " << v << "\n";
}
std::cout << "------------------ BEGIN REST_CONTENT -----------------\n";
std::cout << rest_content << "\n";
std::cout << "------------------ END REST_CONENT --------------------\n";
}
Which parses the following sample input:
// some random sample "code"
struct base {
std::vector<int> ints;
};
/* class skipped_comment : base { };
*/
namespace q { namespace nested { } } // nested is not supported
class forward_declared;
template <typename T> // actually basically ignored
class
Derived
: base {
std::string more_data_members;
};
enum class MyEnum : int32_t {
foo = 0,
bar, /* whoop } */
qux = foo + bar
};
int main() {
return 0;
}
Printing
class: forward_declared
class: Derived
enumclass: MyEnum
namespace: q
struct: base
------------------ BEGIN REST_CONTENT -----------------
;}template <typename T>;;
int main() {
return 0;
}
------------------ END REST_CONENT --------------------
Conclusion
This result seems a decent pointer to
- explain how to tackle the specific hurdle
- demonstrate how this approach to parsing is breaking down at the slightest obstacle (
namespace a { namespace b { } }
for example)
Caveat Emptor