1

Supposing I have some HTML which contains comments wrapping tables like this:

<!--product_template -->
    <table class='template'>{blah}</table>
<!--end_product_template-->

Sometimes the comment can have extra text in it like this:

<!--product_template *THIS IS MORE COMMENT TEXT* -->
    <table class='template'>{blah}</table>
<!--end_product_template-->

BUT... it is occassionally possible for user error to create something like this:

<!--product_template *THIS IS MORE COMMENT TEXT* -->
    <p>&nbsp;</p>
<!--end_product_template-->

I need to be able to find these comment sections and read the content only where there is a table contained

I have this simple Regex <!--product_template.*?<table.*?<!--end_product_template-->

Which almost works:

https://regex101.com/r/PpDj3y/3

BUT... as you can see from the fiddle, it is capturing any table between any <!--product_template and <!--end_product_template--> but it needs to capture only between <!--product_template and the first <!--end_product_template--> following it.

I can't figure out how to do that! My Regex matches in that fiddle are correct on the first two matches, but the third one contains too much information and should only start capturing at the start of the last line, the comments which do not contain a table should not be captured.

EDIT:

Not a duplicate question? My question is asking for text between two strings only if they contain certain other text, the question cited as a duplicate is only concerned with finding text between two strings excluding the specific additional requirement that those strings must also contain "

Jamie Hartnoll
  • 7,231
  • 13
  • 58
  • 97

1 Answers1

1

The regex that you need is a regex based on a tempered greedy token:

<!--product_template(?:(?!<!--product_template).)*?<table.*?<!--end_product_template-->

See the regex demo.

The main point is that .*? matches from the leftmost position to another leftmost position, but still it matches as many characters as necessary to return a valid match. .*? may overflow across another <!--product_template substring if the previous block does not contain <table substring. The tempered greedy token will prevent this.

Details:

  • <!--product_template - a literal substring
  • (?:(?!<!--product_template).)*? - any char that does not start a <!--product_template char sequence, as few as possible, up to the first occurrence of the subsequent subpattern.
  • <table - a literal substring
  • .*? - any 0+ chars as few as possible up to the first...
  • <!--end_product_template--> - a literal substring
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563