0

I want to do matching in the following way for a large multiline text:

I have a few matching patterns:

$text =~ m#finance(.*?)end#s;

$text =~ m#<class>(.*?)</class>#s;

$text =~ m#/data(.*?)<end>#s;

If either one is matched, then print the result print $1, and then continue with the rest of the text to match again for the three patterns.

How can I get the printed results in the order they appear in the whole text?

Many thanks for your help!

Nikhil Jain
  • 8,232
  • 2
  • 25
  • 47
Qiang Li
  • 10,593
  • 21
  • 77
  • 148

1 Answers1

5
while ($text =~ m#(?: finance (.*?) end
                  |   <class> (.*?) </class>
                  |   data    (.*?) </end>
                  )
                 #sgx) {
  print $+;
}

ought to do it.

$+ is the last capturing group that successfully matched.

The /g modifier is intended specifically for this kind of usage; it turns the regex into an iterator that, when resumed, continues the match where it left off instead of restarting at the beginning of $text.

(And /x lets you use arbitrary whitespace, meaning you can make your regexes readable. Or as readable as they get, at least.)

If you need to deal with multiple captures, it becomes a bit harder as you can't use $+. You can, however, test for capturing groups being defined:

while ($text =~ m#(?: a (.*?) b (.*?) c
                  |   d (.*?) e (.*?) f
                  |   data      (.*?) </end>
                  )
                 #sgx) {
  if (defined $1) {
    # first set matched (don't need to check $2)
  }
  elsif (defined $3) {
    # second set matched
  }
  else {
    # final one matched
  }
}
geekosaur
  • 59,309
  • 11
  • 123
  • 114
  • 1
    The branch-reset operator would be useful here: `(?|⋯|⋯|⋯|⋯)`. – tchrist Mar 18 '11 at 13:23
  • I habitually stick with 5.00503 compatibility because that was the standard Perl at my previous employer. (Universities....) – geekosaur Mar 18 '11 at 18:15
  • @geekosaur: Just had a question today while working on something here. If I have more than one captures in the matches, for example `while ($text =~ m#(?: finance (.*?) ac (.*?) end | (.*?) bb (.*?) | data (.*?) ) #sgx) {`, how to select the matches using `$+`? I tried `$1+`, which does not work. Thanks again. – Qiang Li Apr 12 '11 at 02:21
  • That one's a bit harder; `$+` means `the last group that successfully matched`, it is not a modifier and you can't use it with other specifiers. You *can* test if a capturing group is `defined`. I'll update above for that. – geekosaur Apr 12 '11 at 19:59