1

I'm still fairly green when it comes to regular expressions. What I am trying to achieve is :

Source:

<!-- Text --><b>Text</b>
    <a href="google.com">Link</a>
    <div class="col"><h1>Nested Content</h1><p>More content</p>
    </div>
<!-- END of Text -->
More text <!-- Another Tag Comment -->

Expected Capture :

$1 = Text
$2 = <b>Text</b>
        <a href="google.com">Link</a>
        <div class="col"><h1>Nested Content</h1><p>More content</p>
        </div>
$3 = END of Text

Current Regex :

/\<\!-*( *[A-Za-z]*) *-*\>([\s\S\t\r]*)\<\!-*( *[A-Za-z]*) *-*\>/igm

The issues are its too greedy it continues until the match in the source ending with :

$3 = Another Tag Comment 

How do I go about refactoring my regex to end with the expected capture ?

bart_88
  • 488
  • 2
  • 4
  • 15
  • 1
    [Don't parse HTML with regex!](http://stackoverflow.com/a/1732454/418066) – Biffen Dec 18 '14 at 06:32
  • @Biffen In this case it is a known subset of HTML that is being parsed. I realise a parser would be a more ideal solution but this is a throw away tool I am working on. – bart_88 Dec 22 '14 at 04:23

2 Answers2

1
<!--((?:(?!-->).)*)-->((?:(?!<!--)[\s\S])+)<!--((?:(?!-->).)*)-->

You can try this.See demo.

https://regex101.com/r/cA4wE0/17

vks
  • 67,027
  • 10
  • 91
  • 124
0

You need to make the inner pattern [\s\S]* as non-greedy and also you need to add \s or space inside the last character class [A-Za-z]* . Add word boundaries \b, inorder to do an exact string match.

\<\!-* *([A-Za-z]*) *-*\>([\s\S]*?)<!-* *(\b[A-Za-z ]*\b) *-*\>

DEMO

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274