Matching text not enclosed by parenthesis

Question

I am still learning Perl, so apologies if this is an obvious question. Is there a way to match text that is NOT enclosed by parenthesis? For example, searching for foo would match the second line only.

(bar foo bar)
bar foo (
bar foo 
   (bar) (foo)
)

are you using negative lookahead? – Boschko Dec 12 '17 at 18:02 — Boschko, Dec 12 '17 at 18:02

ikegami · Answer 1 · 2017-12-13T15:19:16.047

5

Regex patterns have an implicit leading \G(?s:.)*? ("skip characters until a match is found"). The following expands that definition to consider nested parens to be a character to skip.

while (
   $string =~ m{
      \G (?&MEGA_DOT)*?

      ( foo )

      (?(DEFINE)
         (?<MEGA_DOT> [^()] | \( (?&MEGA_DOT)*+ \) )
      )
   }xg
) {
   say "Found a match at pos $-[1].";
}

edited Dec 13 '17 at 15:19

answered Dec 12 '17 at 18:05

ikegami

367,544
15
269
518

1

Admirable! But I reckon this is a reason why people get frightened when they scent Perl... ;-) – PerlDuck Dec 12 '17 at 19:53
1

@PerlDuck, Actually, it's amazing how simple and structured Perl made this. – ikegami Dec 12 '17 at 20:00
As a beginner I couldn't understand this hence kindly provide a detailed methodology to learn. – ssr1012 Dec 13 '17 at 06:49
I second @ssr1012 request. – Tohiko Dec 13 '17 at 13:02
I already explained the methodology in the answer. The rest is documented in [perlre](http://perldoc.perl.org/perlre.html), but it's either elementary regex, or obvious (`(&NAME)` matches the pattern defined by `(? ... )`). – ikegami Dec 13 '17 at 15:14

zdim · Accepted Answer · 2018-01-21T05:46:17.147

4

This is very far from "obvious"; on the contrary. There is no direct way to say "don't match" for a complex pattern (there is good support at a character level, with [^a], \S etc). Regex is firstly about matching things, not about not-matching them.

One approach is to match those (possibly nested) delimiters and get everything other than that.

A good tool for finding nested delimiters is the core module Text::Balanced. As it matches it can also give us the substring before the match and the rest of the string after the match.

use warnings;
use strict;
use feature 'say';

use Text::Balanced qw(extract_bracketed);

my $text = <<'END';
(bar foo bar)
bar foo (
bar foo 
   (bar) (foo)
   )
END

my ($match, $before);
my $remainder = $text;
while (1) {
    ($match, $remainder, $before) = extract_bracketed($remainder, '(', '[^(]*');
    print $before // $remainder;
    last if not defined $match; 
}

The extract_bracketed returns the match, the remainder substring ($remainder), and the substring before the match ($before); so we keep matching in the remainder.

Taken from this post, where there are more details and another way, using Regexp::Common.

edited Jan 21 '18 at 05:46

answered Dec 12 '17 at 19:31

zdim

64,580
5
52
81

I didn't know about this module. Thanks! Though, I am finding it difficult to find the line number when matching inside `$text` or `$lead`. One way could be to count the number of newline characters in `$match`. But is there a better way? – Tohiko Dec 13 '17 at 13:01
1

@Tohiko Welcome. You mean to find what char/line in the source is found? Counting `\n` in the `$lead` (or `$text`, as it's being depleted) won't inform what line it is in the source. I'll look into it. – zdim Dec 13 '17 at 21:09
@Tohiko Note that I changed `$lead` to `$before` and `$text` to `$remainder` – zdim Jan 21 '18 at 05:48
@Tohiko I didn't find a better way than counting. It is not tested well (works on _this_ example) and there are probably all kinds of cases for which it won't, but here is the idea. I add `$line_cnt=1` and after `print ...` add the line `$line_cnt += () = ($before =~ /\n/g) if $before;`. The module may provide a better way but it's been a while since I read the whole page carefully. Will update if I find it – zdim Jan 21 '18 at 05:48
@Tohiko Another feature is `$@->{pos}`, see [Diagnostics](https://perldoc.perl.org/Text/Balanced.html#DIAGNOSTICS). But in this case, because of recursive processing, I still have to count lines. – zdim Jan 21 '18 at 06:15

Matching text not enclosed by parenthesis

2 Answers2

Linked