1

I needed some perl code to match balanced parens in a string.

so I found this regular expresion code below from .Net and pasted it into my Perl program thinking the regex engine was similar enough for it to work:

 /
           \s*\(
           (?: [^\(\)] | (?<openp>\() | (?<-openp>\)) )+ 
           (?(openp)(?!)) 
           \)\s*
        /x

My understanding of how this regex works is a follows:

  1. Match first paren:
\(
  1. Match pattern a, b, or c at least once:
(?: <a> | <b> | <c>)+

where a, b, and c are:

  • a is any character that is not a paren
[^\(\)]
  • b is character that is a left-paren
\(
  • c is character that is a right-paren
\)

and:

  • b is a capture group that pushes to named capture "openp"
(?<openp>\()
  • c is a capture group that pops from named capture "openp"
(?<openp>\()
  1. reject any regular expresssion match where openp doesn't equal zero items on stack:
(?<-openp>\))

4. match end paren

\)

Here's the perl code:

sub eat_parens($) {
    my $line = shift;    
    if ($line !~ /
           \s*\(
           (?: [^\(\)] | (?<openp>\() | (?<-openp>\)) )+ 
           (?(openp)(?!)) 
           \)\s*
        /x)
    {
        return $line;
    }    
    return $';
}

sub testit2 {
    my $t1 = "(( (sdfasd)sdfsas (sdfasd) )sadf) ()";
    $t2 = eat_parens($t1);
    print "t1: $t1\n";
    print "t2: $t2\n";
}

testit2();

Error is:

$ perl x.pl
Sequence (?<-...) not recognized in regex; marked by <-- HERE in m/\s*\((?: [^\(\)] | (?<openp> \( ) | (?<- <-- HERE openp> \) ) )+ (?(openp)(?!) ) \) \s*/ at x.pl line 411.

Not sure what's causing this.... any ideas?

pico
  • 1,660
  • 4
  • 22
  • 52
  • You're trying to use a complicated regular expression written for one dialect (C#/.net) with a different dialect (perl). Naturally there's going to be issues. – Shawn Nov 08 '22 at 14:49
  • See https://perldoc.perl.org/perlre#(?PARNO)-(?-PARNO)-(?+PARNO)-(?R)-(?0) – Shawn Nov 08 '22 at 14:55
  • @Shawn , I had some question about this link. I noticed the regex has a "foo" in the middle of it... would i need to remove that to make it purely match parens? – pico Nov 08 '22 at 14:59
  • Yes, you'd have to adjust the bit(s) that match literal strings as appropriate. – Shawn Nov 08 '22 at 15:04
  • Does this answer your question? [Can I use Perl regular expressions to match balanced text?](https://stackoverflow.com/questions/4445674/can-i-use-perl-regular-expressions-to-match-balanced-text) – Shawn Nov 08 '22 at 15:07
  • I don't see `(?<-...)` in the [.NET reference](https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference). Could you provide docs for it? – ikegami Nov 08 '22 at 15:08
  • (?pattern) Captures the specified pattern into the specified group name. The string used for the name must not contain any punctuation and cannot begin with a number. https://regexhero.net/reference/ – pico Nov 08 '22 at 15:19
  • `(?<-...)` is not mentioned in that linked document either. It's a copy of the official docs I previously linked with stuff removed. So again I ask, could you provide docs for it? – ikegami Nov 08 '22 at 15:31

1 Answers1

1

Here's one way to do it:

/
   (?&TEXT)

   (?(DEFINE)
      (?<TEXT>
         [^()]*+
         (?: \( (?&TEXT) \)
             [^()]*+
         )*+
      )
   )
/x

It can also be done without naming anything. Search for "recursive" in perlre.

ikegami
  • 367,544
  • 15
  • 269
  • 518