3

I am still learning Perl, so apologies if this is an obvious question. Is there a way to match text that is NOT enclosed by parenthesis? For example, searching for foo would match the second line only.

(bar foo bar)
bar foo (
bar foo 
   (bar) (foo)
)
Tohiko
  • 1,860
  • 2
  • 18
  • 26

2 Answers2

5

Regex patterns have an implicit leading \G(?s:.)*? ("skip characters until a match is found"). The following expands that definition to consider nested parens to be a character to skip.

while (
   $string =~ m{
      \G (?&MEGA_DOT)*?

      ( foo )

      (?(DEFINE)
         (?<MEGA_DOT> [^()] | \( (?&MEGA_DOT)*+ \) )
      )
   }xg
) {
   say "Found a match at pos $-[1].";
}
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • 1
    Admirable! But I reckon this is a reason why people get frightened when they scent Perl... ;-) – PerlDuck Dec 12 '17 at 19:53
  • 1
    @PerlDuck, Actually, it's amazing how simple and structured Perl made this. – ikegami Dec 12 '17 at 20:00
  • As a beginner I couldn't understand this hence kindly provide a detailed methodology to learn. – ssr1012 Dec 13 '17 at 06:49
  • I second @ssr1012 request. – Tohiko Dec 13 '17 at 13:02
  • I already explained the methodology in the answer. The rest is documented in [perlre](http://perldoc.perl.org/perlre.html), but it's either elementary regex, or obvious (`(&NAME)` matches the pattern defined by `(? ... )`). – ikegami Dec 13 '17 at 15:14
4

This is very far from "obvious"; on the contrary. There is no direct way to say "don't match" for a complex pattern (there is good support at a character level, with [^a], \S etc). Regex is firstly about matching things, not about not-matching them.

One approach is to match those (possibly nested) delimiters and get everything other than that.

A good tool for finding nested delimiters is the core module Text::Balanced. As it matches it can also give us the substring before the match and the rest of the string after the match.

use warnings;
use strict;
use feature 'say';

use Text::Balanced qw(extract_bracketed);

my $text = <<'END';
(bar foo bar)
bar foo (
bar foo 
   (bar) (foo)
   )
END

my ($match, $before);
my $remainder = $text;
while (1) {
    ($match, $remainder, $before) = extract_bracketed($remainder, '(', '[^(]*');
    print $before // $remainder;
    last if not defined $match; 
}

The extract_bracketed returns the match, the remainder substring ($remainder), and the substring before the match ($before); so we keep matching in the remainder.

Taken from this post, where there are more details and another way, using Regexp::Common.

zdim
  • 64,580
  • 5
  • 52
  • 81
  • I didn't know about this module. Thanks! Though, I am finding it difficult to find the line number when matching inside `$text` or `$lead`. One way could be to count the number of newline characters in `$match`. But is there a better way? – Tohiko Dec 13 '17 at 13:01
  • 1
    @Tohiko Welcome. You mean to find what char/line in the source is found? Counting `\n` in the `$lead` (or `$text`, as it's being depleted) won't inform what line it is in the source. I'll look into it. – zdim Dec 13 '17 at 21:09
  • @Tohiko Note that I changed `$lead` to `$before` and `$text` to `$remainder` – zdim Jan 21 '18 at 05:48
  • @Tohiko I didn't find a better way than counting. It is not tested well (works on _this_ example) and there are probably all kinds of cases for which it won't, but here is the idea. I add `$line_cnt=1` and after `print ...` add the line `$line_cnt += () = ($before =~ /\n/g) if $before;`. The module may provide a better way but it's been a while since I read the whole page carefully. Will update if I find it – zdim Jan 21 '18 at 05:48
  • @Tohiko Another feature is `$@->{pos}`, see [Diagnostics](https://perldoc.perl.org/Text/Balanced.html#DIAGNOSTICS). But in this case, because of recursive processing, I still have to count lines. – zdim Jan 21 '18 at 06:15