6

I want to extract a string from a piece of text. This string must start end end with a certain string.

Example:

Word 1 = "Hello"
Word 2 = "World"

Text:

Hello, this is a sentence.
The whole World can read this.
What World?

The piece of text i want to extract is:

Hello, this is a sentence.
The whole World

What kind of regular exception should i use for extraction of the string.

Note: the string 'World' occurs twice.

Thanks

Mats Stijlaart
  • 5,058
  • 7
  • 42
  • 58

2 Answers2

6
^\bHello\b.*?\bWorld\b

Where the "." also matches newline! Note the word boundaries \b, you don't want to match anything which is not exactly Hello or World, as if those words were part of other words.

if ($subject =~ m/^\bHello\b.*?\bWorld\b/s) {
    $result = $&;
}

Note the s modified which instructs

.

to match newline characters too.

FailedDev
  • 26,680
  • 9
  • 53
  • 73
0

The simplest option is using a lazy quantifier (*?). The would match from the first Hello to the first World. (remember the /s flag, for dot-all)

Hello.*?World

This can be a problem if you don't want the capture text to contain Hello either. A more sneaky option then is:

Hello(?:(?!Hello|World).)*World

Or

Hello(?:(?!Hello).)*?World
Community
  • 1
  • 1
Kobi
  • 135,331
  • 41
  • 252
  • 292