1

I have a HTML text with <!--[if gte mso 9]> and <![endif]--> tags. I want to remove everything that there is between those two tags. I'm using the ruby function gsub with a Regex expression, but it won't work.

This is what I've tried:

text = "<!--[if gte mso 9]><xml>\n <w:WordDocument>\n [...] \n</style>\n<![endif]-->"

text2 = text.gsub /(?=<!\-\-\[if gte mso 9\]>)(.*?)(?<=<!\[endif\]\-\->)/, ""

What I want as an answer is:

text2 = "<!--[if gte mso 9]><![endif]-->"

Or even:

text2 = ""

I tried this based on this article

I've tried this online Regex tester and, it seems to be the right way to do it, but it won't work on my program!

Please help!

Thanks in advance!

Community
  • 1
  • 1

1 Answers1

4

Try this regex /(?<=<!--\[if gte mso 9\]>).*?(?=<!\[endif\]-->)/m, and do a gsub on the string. You will get <!--[if gte mso 9]><![endif]-->

  • (?<=<!--\[if gte mso 9\]>) is a positive look behind, which matches the <!--\[if gte mso 9\]> string, but doesn't include it in the result.

  • .* matches any characters 0 or more times.

  • (?=<!\[endif\]-->) is a positive look forward, which matches the <!\[endif\]--> but doesn't include it in the result.

  • the m identifier at the end means the match multiline strings. Since you declared your string with "", the \n will be interpreted as a new line.

Essentially, you are matching everything in between the two tags.

In your regex, /(?=<!\-\-\[if gte mso 9\]>)(.*?)(?<=<!\[endif\]\-\->)/, you used the positive look forward for the first tag, and positive look behind for the second tag, you need to flip them.

  • Positive look forward matches a group after the main expression without including it in the result.

  • Positive look behind matches a group before the main expression without including it in the result.

davidhu
  • 9,523
  • 6
  • 32
  • 53