1

I am trying to match C++ argument type which can contain balanced <and > characters.

With this regex: (\<(?>[^<>]|(?R))*\>)

On this string: QMap<QgsFeatureId, QPair<QMap<Something, Complex> >>

It matches all expect the first 4 characters (QMap).

Now, if I add \w+ at the start of my regex, it now only matches the end of it (QPair<QMap<Something, Complex> >>) and not the whole string.

What is the explanation and how to solve this?

You can try it online here.

This is intented to use in Perl 5.10+ (5.24).

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Denis Rouzaud
  • 2,412
  • 2
  • 26
  • 45

1 Answers1

4

The (?R) construct recurses the entire pattern. When you add \w+ at the start, it is also accounted for when the recursion takes place. However, what you want to recurse is the Group 1 subpattern.

You need a subroutine call that will recurse the capturing group subpattern:

(\w+)(<(?:[^<>]++|(?2))*>)

See the regex demo

Details

  • (\w+) - Group 1 capturing the identifier (you may change it to [a-zA-Z]\w*)
  • (<(?:[^<>]++|(?2))*>) - Group 2 (that will be recursed)
    • < - a literal <
    • (?:[^<>]++|(?2))* - either 1+ chars other than < and > (possessively, to make it faster) or (|) the whole Group 2 pattern ((?2)).
    • > - a literal >

Results:

Match:   QMap<QgsFeatureId, QPair<QMfap<Something, Complex> >>
Group 1: QMap
Group 2: <QgsFeatureId, QPair<QMfap<Something, Complex> >>
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563