Questions tagged [regex-recursion]

Some modern regex flavors support recursion in Regex: Perl 5.10, PCRE 4.0, Ruby 2.0, and all later versions of these three, support regular expression recursion. Ruby 1.9 supports capturing group recursion (the whole regex can be recursed if wrapped in a capturing group.) .NET does not support recursion, but it supports balancing groups that can be used instead of recursion to match balanced constructs.

From regular-expressions.info:
Perl 5.10, PCRE 4.0, Ruby 2.0, and all later versions of these three, support regular expression recursion. Perl uses the syntax (?R) with (?0) as a synonym. Ruby 2.0 uses \g<0>. PCRE supports all three as of version 7.7. Earlier versions supported only the Perl syntax (which Perl actually copied from PCRE). Recent versions of Delphi, PHP, and R also support all three, as their regex functions are based on PCRE. JGsoft V2 also supports all variations of regex recursion.

While Ruby 1.9 does not have any syntax for regex recursion, it does support capturing group recursion. So you could recurse the whole regex in Ruby 1.9 if you wrap the whole regex in a capturing group. .NET does not support recursion, but it supports balancing groups that can be used instead of recursion to match balanced constructs.

As we'll see later, there are differences in how Perl, PCRE, and Ruby deal with backreferences and backtracking during recursion. While they copied each other's syntax, they did not copy each other's behavior. JGsoft V2, however, copied their syntax and their behavior. So JGsoft V2 has three different ways of doing regex recursion, which you choose by using a different syntax. But these differences do not come into play in the basic example on this page.

Boost 1.42 copied the syntax from Perl but its implementation is marred by bugs, which are still not all fixed in version 1.62. Most significantly, quantifiers other than * or {0,} cause recursion to misbehave. This is partially fixed in Boost 1.60 which correctly handles ? and {0,1} too.

The regexes a(?R)?z, a(?0)?z, and a\g<0>?z all match one or more letters a followed by exactly the same number of letters z. Since these regexes are functionally identical, we'll use the syntax with R for recursion to see how this regex matches the string aaazzz.

First, a matches the first a in the string. Then the regex engine reaches (?R). This tells the engine to attempt the whole regex again at the present position in the string. Now, a matches the second a in the string. The engine reaches (?R) again. On the second recursion, a matches the third a. On the third recursion, a fails to match the first z in the string. This causes (?R) to fail. But the regex uses a quantifier to make (?R) optional. So the engine continues with z which matches the first z in the string.

Now, the regex engine has reached the end of the regex. But since it's two levels deep in recursion, it hasn't found an overall match yet. It only has found a match for (?R). Exiting the recursion after a successful match, the engine also reaches z. It now matches the second z in the string. The engine is still one level deep in recursion, from which it exists with a successful match. Finally, z matches the third z in the string. The engine is again at the end of the regex. This time, it's not inside any recursion. Thus, it returns aaazzz as the overall regex match.

22 questions
6
votes
3 answers

Capturing text before and after a C-style code block with a Perl regular expression

I am trying to capture some text before and after a C-style code block using a Perl regular expression. So far this is what I have: use strict; use warnings; my $text = << "END"; int max(int x, int y) { if (x > y) { return x; } …
tjwrona1992
  • 8,614
  • 8
  • 35
  • 98
5
votes
2 answers

How exactly does this recursive regex work?

This is a followup to this question. Have a look at this pattern: (o(?1)?o) It matches any sequence of o with a length of 2n, with n ≥ 1. It works, see regex101.com (word boundaries added for better demonstration). The question is: Why? In the…
Imanuel
  • 3,596
  • 4
  • 24
  • 46
4
votes
2 answers

Why doesn't this recursive regex capture the entire code block?

I'm trying to write a recursive regular expression to capture code blocks, but for some reason it seems to not be capturing them properly. I would expect the code below to capture the full body of the function, but instead it only captures the…
tjwrona1992
  • 8,614
  • 8
  • 35
  • 98
3
votes
3 answers

How to get a group matched in a recursive regular expression?

I'm writing a simple regular expression that needs receive a pair of coordinates and/or a map name. For example: move 10 15 # should returns [[10, 15]] move 10 15 map # should returns [[10, 15, 'map']] move map # should returns [['map']] move 10 15…
macabeus
  • 4,156
  • 5
  • 37
  • 66
3
votes
1 answer

Regex Recursion: Nth Subpatterns

I'm trying to learn about Recursion in Regular Expressions, and have a basic understanding of the concepts in the PCRE flavour. I want to break a string: Geese (Flock) Dogs (Pack) into: Full Match: Geese (Flock) Dogs (Pack) Group 1: Geese…
rsylatian
  • 429
  • 2
  • 14
3
votes
3 answers

Regex Expression to capture repeated patterns

I've been running around internet trying to find out how to build a regular expression to capture text in the way I need it; so I saw some StackOverflow questions but none of them express what I want, but if you already saw something similar to my…
Larry
  • 96
  • 8
2
votes
2 answers

Regex: How can I catch big numbers with spaces in them?

I'm trying to catch all numbers from a string using Python regex. By numbers I mean integers and floats (using , or .). I managed to get it done using this regex : ([0-9]+[\,|\.][0-9]+|[0-9]+) But I have a problem, I need it to match big numbers…
cuzureau
  • 330
  • 2
  • 17
2
votes
1 answer

Backreferences to other recursion levels in PHP / PCRE

I was trying to find the answer on the Net but I was not able to. Does the third section on this page applies to PHP / PCRE or not? https://www.regular-expressions.info/recursebackref.html the "Backreferences to other recursion levels part". I am…
user9098366
1
vote
1 answer

RegEx to capture a Pascal procedure's body

I'm trying to write a RegEx to capture a Pascal procedure's body. My biggest problem so far is capturing a procedure which has a nested procedure inside. Test string: Test procedure A; procedure B; begin end; begin if True then …
Bruno Kinast
  • 1,068
  • 4
  • 17
1
vote
1 answer

Regex for matching text followed by text until closed bracket except matching opening bracket

Sorry if question is confusion. From below text value = ( select max( tbl.column_name ) from table tbl where trim(colum2)='value1' and (trim((colum3)))='value3') I want output below select max( tbl.column_name ) from table tbl where…
Siva
  • 13
  • 3
1
vote
1 answer

How can I use a recursive regex or another method to recursively validate this BBcode-like markup in Python?

I am attempting to write a program that validates documents written in a markup language similar to BBcode. This markup language has both matching ([b]bold[/b] text) and non-matching (today is [date]) tags. Unfortunately, using a different markup…
1
vote
1 answer

Don't understand where my "Unnecessary escape character" error comes from

I have a url that contains two values, a keyword and a location name. The keyword comes after the '/r/' and the location after the '/l/'. E.g: localhost:3000/search/r/keyword/l/location To get only the value after the '/r/' (keyword) and the value…
Cédric Bloem
  • 1,447
  • 3
  • 16
  • 34
1
vote
1 answer

Unexpected behavior around recursive regex

I am trying to match C++ argument type which can contain balanced characters. With this regex: (\<(?>[^<>]|(?R))*\>) On this string: QMap >> It matches all expect the first 4 characters…
Denis Rouzaud
  • 2,412
  • 2
  • 26
  • 45
0
votes
0 answers

Recursive regex complexity

For a regex that support only +,?,*,.,|,[..],[^..],^,$,(..), a matcher that lets you match recursion: {} with length m, and a string of length n. (the regex isn't supporting positive/negative look-ahead/behind) What is the best…
Ofek
  • 1,065
  • 6
  • 19
0
votes
0 answers

How can I use SPARQL regex to parse Wikitext and extract values from parameters in a Wikimedia Commons template?

This query against the Wikidata SPARQL endpoint returns the Wikitext content of the first 50 files in the Wikimedia Commons category "1930s photographs in Auckland Museum". For each file, I want to extract several pieces of data from that…
Hugh
  • 73
  • 6
1
2