3

I would like to force rematch in the following scenario - I'm trying to inverse match a qualifier after each element in a list. In other words I have:

"int a, b, c" =~ m{
(?(DEFINE)
    (?<qualifs>\s*(?<qualif>\bint\b|\bfloat\b)\s*+(?{print $+{qualif}  . "\n"}))
    (?<decl>\s*(?!(?&qualif))(?<ident>[_a-zA-Z][_a-zA-Z0-9]*+)\s*(?{print $+{ident} . "\n"}))

    (?<qualifsfacet>\s*\bint\b\s*+)
    (?<declfacet>[_a-zA-Z][_a-zA-Z0-9]*+)
)


^((?&qualifsfacet)*+(?!(?&decl))
                |(?&qualifs)*+(?&declfacet)
                |((?&qualifsfacet)
                (?&declfacet)(?<negdecl>\g{lastnegdecl}(,(?&decl)))
                |(?&qualifs)*+(?&declfacet)(?<lastnegdecl>\g{negdecl})
                (?# Here how to force it to retry last with new lastnegdecl)))$
                }xxs;

And would like to have:

a
int
b
int
c
int

As output. Currently it's only this:

a
int
int

I think this might work if there is a way to tell the regex machine to retrigger a match for the new lastnegdecl that is being captured.

AnArrayOfFunctions
  • 3,452
  • 2
  • 29
  • 66
  • Currently the regex matches the second alternative `(?&qualifs)*+ (?&declfacet)` so it will not try the third alternative where you define the `lastnegdecl`, right? Btw, why do you use `*+` in `(?&qualifs)*+` ? Shouldn't it be `*?` ? – Håkon Hægland Sep 22 '21 at 12:31
  • 1
    @HåkonHægland Sorry I'm used to matching inside a nested pattern - it should try it if you put `^` and `$` which I missed. I'll edit my question now. `*+` is atomic none or more greedy I believe. – AnArrayOfFunctions Sep 22 '21 at 12:36
  • It seems like an issue that `negdecl` and `lastnegdecl` will not get defined if the sub pattern fails, but both sub patterns depend on the other being defined – Håkon Hægland Sep 22 '21 at 12:55
  • @HåkonHægland I actually figured that out and put a check on the first instance like so `(?()\g{lastnegdecl}|)` but I found that then I can't understand why `negdecl` isn't defined in `lastnegdecl`. – AnArrayOfFunctions Sep 22 '21 at 12:58
  • It is not clear what the bigger picture is. For example, why do you want this specific output. What are you trying to parse, is it a C program? Then why did you not use a dedicated parser like [`Marpa::R2`](https://metacpan.org/pod/Marpa::R2) or [`Regexp::Grammars`](https://metacpan.org/pod/Regexp::Grammars). It might be easier to suggest alternative solutions if we know more information about what you are trying to achieve. – Håkon Hægland Sep 22 '21 at 13:41
  • @HåkonHægland Well I can use clang frontend as well but my purpose is to write it on my own with pure RegEx since I find this innovative and having potential. I need this specific output so my C++ generating code can work (I'm normally embedding the Perl interpreter in native code and calling hooks on each mid-pattern code) - because it takes an identifier and then registers the type. – AnArrayOfFunctions Sep 22 '21 at 13:47
  • 1
    Sounds like an interesting project yes.. So currently you are using the regex match for its side effect by calling `(?{print ...})` inside? You do not make use of the resulting match variables like `%+` after the regex match? – Håkon Hægland Sep 22 '21 at 14:14
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/237371/discussion-between-anarrayoffunctions-and-hakon-haegland). – AnArrayOfFunctions Sep 22 '21 at 14:56

1 Answers1

1

Well after some trying I finally figured it out (besides the obvious whitespace issues I had in my original post):

"int a, b, c" =~ m{
(?(DEFINE)
    (?<qualifs>\s*+(?<qualif>\bint\b|\bfloat\b)\s*+(?{print $+{qualif}  . "\n"}))
    (?<decl>\s*+(?!(?&qualif))(?<ident>[_a-zA-Z][_a-zA-Z0-9]*+)\s*(?{print $+{ident} . "\n"}))

    (?<qualifsfacet>\s*+(\bint\b|\bfloat\b)\s*+)
    (?<declfacet>\s*+[_a-zA-Z][_a-zA-Z0-9]*+\s*+)
)


^((?&qualifsfacet)(?!(?&decl))
                |(?&qualifs)*+(?&declfacet)
                |(?<restoutter>(?=(?&qualifsfacet)(?&declfacet)
                (?<rest>(?(<rest>)\g{rest}),(?&decl)))
                ((?&qualifs)(?&declfacet)\g{rest}|(?&restoutter)))
                |(?&qualifsfacet)(?&declfacet)(,(?&declfacet))*+)$
                }xxs;

Basically I'm doing a positive lookahead where decl are called with code but qualifs are not while also concatenating decl inside rest then doing a partial match with the qualifs and the rest and if it doesn't match it goes to do the same thing again. Maybe someone can explain it better but it works. The output of the program above is:

a
int
b
int
c
int

And there is a full match.

AnArrayOfFunctions
  • 3,452
  • 2
  • 29
  • 66