10

How do I regex match everything that is between two strings? The things between two strings span several lines and can contain all html characters too.

For example:

<p>something</p>

<!-- OPTIONAL -->

<p class="sdf"> some text</p>
<p> some other text</p>

<!-- OPTIONAL END -->

<p>The end</p>

I want to strip the whole optional part off. but the greedy any character match isn't doing what I wanted.. the pattern I'm using is

  • <!-- OPTIONAL -->.*<!-- OPTIONAL END -->
  • <!-- OPTIONAL -->(.*)<!-- OPTIONAL END -->
  • <!-- OPTIONAL -->(.*)\s+<!-- OPTIONAL END -->
  • (?=<!-- OPTIONAL -->)(.*)\s+<!-- OPTIONAL END -->

All of them match the first optional tag, if only the first part is given, but doesn't do well with complete lines.

Here's an example: http://regexr.com?352bk

Thanks

LocustHorde
  • 6,361
  • 16
  • 65
  • 94

4 Answers4

8

To make a regex ungreedy, use a ? after the *:

<!-- OPTIONAL -->(.*?)<!-- OPTIONAL END -->

Does this help you?

Also depending on your programming language you use, you have modifiers that will make your regex dot (.) match newlines too. For PHP you have the s (dotall) modifier for example:

http://php.net/manual/en/reference.pcre.pattern.modifiers.php

gitaarik
  • 42,736
  • 12
  • 98
  • 105
7

Check the dotall checkbox in RegExr :)

Without the dotall flag (the s in /regex/s), a dot (.) won't match carriage returns.

You should use .*? instead of .* to lazy match the optional content (see the PLEASE DO NOT MATCH! sentence in the examples).

sp00m
  • 47,968
  • 31
  • 142
  • 252
  • Aah! what is dotall, and what does it do, please? – LocustHorde May 30 '13 at 15:31
  • @LocustHorde By default the wildchard character in regex (`.`) doesn't match newline characters, meaning that the regex stops searching at the end of the line. By enabling dotall the `.` will also include the newline characters in its match. – Bad Wolf May 30 '13 at 15:35
4

playing with your example I think I found the answer, check this in your code:

<!-- OPTIONAL -->[\w\W]*<!-- OPTIONAL END -->

I'll hope this help

2

Enable the "dotall" option so that the . in regex will match newline characters and work across multiple lines. There are various ways to do this depending on your implementation of regex, check the manual for your implementation.

Bad Wolf
  • 8,206
  • 4
  • 34
  • 44