4

This pattern does the work

(?:\G(?!\A)|begin).*?\K(keyword)(?=.*end)

String:

begin
keyword
keyword 
end

I get what I want (keyword keyword) in just one capture group, but if the string has this:

begin
keyword
keyword 
end
keyword
end

I get three matches, How to stop in the first end ?

Can be this pattern be better, optimized?

demo regex

brian d foy
  • 129,424
  • 31
  • 207
  • 592
The nothing
  • 148
  • 1
  • 3
  • 13
  • 1
    `(?:\G(?!\A)|begin)(?:(?!begin|end).)*?\K(keyword)(?=.*end)` or `(?:\G(?!\A)|begin)(?:(?!begin|end).)*?\K(keyword)(?=(?:(?!begin).)*end)` – Wiktor Stribiżew Mar 29 '20 at 12:19
  • Why dont't you just match `begin` and `end` too and then extract a matching group? Would simplify the pattern.. – ssc-hrep3 Mar 29 '20 at 12:20
  • TRY: SEARCH: `(?s).*?(begin)(?-s).*\R((?:.*\R)*?).*(end)(?s).*` REPLACE BY:`\1\r\2\3` see here: https://regex101.com/r/Xw1ueP/3 – Just Me Mar 29 '20 at 13:06

3 Answers3

2

I would hate to run across such a regex in code. Any small change and it's broken.

I'd open a filehandle on a reference to the string then read its lines. Skip everything until you run into the starting line, then read everything up to the ending line:

use v5.26;

my $string =<<~'HERE';
    begin
    keyworda
    keywordb
    end
    keywordc
    end
    HERE

open my $fh, '<', \$string;

while( <$fh> ) { last if /\Abegin/ }

my @keywords;
while( <$fh> ) {
    last if /^end/;
    chomp;
    push @keywords, $_;
    }

say join "\n", @keywords;

This outputs:

keyworda
keywordb

Or, break it up into two regexes. One sets the starting position, then you repeatedly match as long as the line isn't the ending line. This is a bit cleaner, but some people may be confused by the global matching in scalar context:

use v5.26;

my $string =<<~'HERE';
    begin
    keyworda
    keywordb
    end
    keywordc
    end
    HERE

my @keywords;
if( $string =~ / ^ begin \R /gmx ) {
    while( $string =~ /\G (?!end \R) (\N+) \R /gx ) {
        push @keywords, $1;
        }
    }

say join "\n", @keywords;
brian d foy
  • 129,424
  • 31
  • 207
  • 592
-1

Use regular expression and store match in an array

my @result = $data =~ /begin\n(.*?)\nend/sg;

then output to console

use strict;
use warnings;
use feature 'say';

use Data::Dumper;

my $data = do { local $/; <DATA> };

my @result = $data =~ /begin\n(.*?)\nend/sg;

say '-' x 35 . "\n" . $_ for @result;

__DATA__
begin
keyword 1
keyword 2
end
keyword
end
keyword
begin
keyword 3
keyword 4
end
keyword 
keyword 

Output

-----------------------------------
keyword 1
keyword 2
-----------------------------------
keyword 3
keyword 4
Polar Bear
  • 6,762
  • 1
  • 5
  • 12
-1

You can use not equal in grouping to fetch the data from begin to end.

 my @keyws = ($data=~/begin((?:(?!begin|end).)*)end/sg);

 use Data::Dumper;

 print Dumper @keyws;

It's my way to doing in LaTeX.

brian d foy
  • 129,424
  • 31
  • 207
  • 592
ssr1012
  • 2,573
  • 1
  • 18
  • 30