1

I am working on https://github.com/F-Bergemann/RegexSplitter.
Purpose: parse a regular expression string, and create breakable and unbreakable top-level substrings. Breakable substrings can be broken up again. Unbreakable substrings must remain as is. Unbreakable applies to 'groups' and 'character classes'
I am currently working on the 'character classes'. For those, i mainly use qi::rule<Iterator, std::string()> parsers, and only a single qi::rule<Iterator, ASTNode*> parser for the root parser. I.e. only the root parser shall create an AST result. The child parsers shall just validate.

When testing compiled regex-splitter i get this:

> ./regex-splitter "[1]"
TEST:[1]
### ASTNode c'tor (std::string &) #1: Unbreakable
### ASTNode c'tor (std::string &) #2: U:[11]
### ASTNode c'tor (ASTNode const *, std::vector<ASTNode *> &) #1: Collection
### ASTNode d'tor #1: Unbreakable
### ASTNode d'tor #2: U:[11]
### ASTNode c'tor (ASTNode const *, std::vector<ASTNode *> &) #2: Collection
U:[11], 
### ASTNode d'tor #1: Collection
### ASTNode d'tor #2: C:[11]

I.e. instead of "[1]" i get as a result "[11]".
I know it has to do with following part of the code:

tok_set_item =
tok_range | tok_char
;

tok_range =
tok_char >> qi::char_('-') >> tok_char
;

tok_char =
qi::alnum // TODO BNF: <char> ::= any non metacharacter | "\" metacharacter
;

It seems to try tok_range, 1st. Then switches to tok_char.
But why to do i get "[11]" here?
It should just validate the syntax and return the original data.

I tried to find out what happens for the parser action here.
I have no explicit parser action here.
What it is using implicitly? boost::variant<...>?
Does it make a difference, when i use qi::as_string[...] wrappers?

Frank Bergemann
  • 325
  • 2
  • 14

1 Answers1

2

It's the age-old "container attributes aren't atomic" pitfall:

You can paper over it using qi::hold. Or you can revise your strategy.

I'll see whether I can find some time to review the code. You might post at CodeReview.stackexchange.com for good measure.

sehe
  • 374,641
  • 47
  • 450
  • 633
  • So... I did a review of the code, and reduced it all the way down to ~40 lines of code (see [commits](https://github.com/sehe/RegexSplitter/tree/sehe) and **[Live Demo](https://wandbox.org/permlink/DmguiV9MEIFo32nP)**. Note how it doesn't do any dynamic allocation (outside the vcctor), no semantic actions, no manual debugging, no more duplicated rules etc. It does just work, just using to `qi::raw`. – sehe Feb 04 '21 at 23:58
  • Thanks a lot! This is a very good example (reference) for how to use BOOST_FUSION_ADAPT_STRUCT() and boost::iterator_range<> to populate different data types with dedicated boost::iterator_range<> parsers, and inter-op for populating a resulting data structure. – Frank Bergemann Feb 06 '21 at 06:31