Regex multi line capture text in and around html comment tag

Question

I'm still fairly green when it comes to regular expressions. What I am trying to achieve is :

Source:

<!-- Text --><b>Text</b>
    <a href="google.com">Link</a>
    <div class="col"><h1>Nested Content</h1><p>More content</p>
    </div>
<!-- END of Text -->
More text <!-- Another Tag Comment -->

Expected Capture :

$1 = Text
$2 = <b>Text</b>
        <a href="google.com">Link</a>
        <div class="col"><h1>Nested Content</h1><p>More content</p>
        </div>
$3 = END of Text

Current Regex :

/\<\!-*( *[A-Za-z]*) *-*\>([\s\S\t\r]*)\<\!-*( *[A-Za-z]*) *-*\>/igm

The issues are its too greedy it continues until the match in the source ending with :

$3 = Another Tag Comment

How do I go about refactoring my regex to end with the expected capture ?

[Don't parse HTML with regex!](http://stackoverflow.com/a/1732454/418066) — Biffen, Dec 18 '14 at 06:32
@Biffen In this case it is a known subset of HTML that is being parsed. I realise a parser would be a more ideal solution but this is a throw away tool I am working on. — bart_88, Dec 22 '14 at 04:23

score 1 · Accepted Answer · answered Dec 18 '14 at 06:39

1

<!--((?:(?!-->).)*)-->((?:(?!<!--)[\s\S])+)<!--((?:(?!-->).)*)-->

You can try this.See demo.

https://regex101.com/r/cA4wE0/17

answered Dec 18 '14 at 06:39

vks

67,027
10
91
124

score 0 · Answer 2 · answered Dec 18 '14 at 06:37

You need to make the inner pattern [\s\S]* as non-greedy and also you need to add \s or space inside the last character class [A-Za-z]* . Add word boundaries \b, inorder to do an exact string match.

\<\!-* *([A-Za-z]*) *-*\>([\s\S]*?)<!-* *(\b[A-Za-z ]*\b) *-*\>

DEMO

Regex multi line capture text in and around html comment tag

2 Answers2