Remove from in HTML [REGEX, RUBY]

Question

I have a HTML text with  tags. I want to remove everything that there is between those two tags. I'm using the ruby function gsub with a Regex expression, but it won't work.

This is what I've tried:

text = "<!--[if gte mso 9]><xml>\n <w:WordDocument>\n [...] \n</style>\n<![endif]-->"

text2 = text.gsub /(?=<!\-\-\[if gte mso 9\]>)(.*?)(?<=<!\[endif\]\-\->)/, ""

What I want as an answer is:

text2 = "<!--[if gte mso 9]><![endif]-->"

Or even:

text2 = ""

I tried this based on this article

I've tried this online Regex tester and, it seems to be the right way to do it, but it won't work on my program!

Please help!

Thanks in advance!

davidhu · Accepted Answer · 2016-08-23T01:02:03.377

Try this regex /(?<=)/m, and do a gsub on the string. You will get 

(?<=<!--\[if gte mso 9\]>) is a positive look behind, which matches the <!--\[if gte mso 9\]> string, but doesn't include it in the result.
.* matches any characters 0 or more times.
(?=<!\[endif\]-->) is a positive look forward, which matches the <!\[endif\]--> but doesn't include it in the result.
the m identifier at the end means the match multiline strings. Since you declared your string with "", the \n will be interpreted as a new line.

Essentially, you are matching everything in between the two tags.

In your regex, /(?=<!\-\-\[if gte mso 9\]>)(.*?)(?<=<!\[endif\]\-\->)/, you used the positive look forward for the first tag, and positive look behind for the second tag, you need to flip them.

Positive look forward matches a group after the main expression without including it in the result.
Positive look behind matches a group before the main expression without including it in the result.

I think you meant `.*`, not `.0`. But you should use `.*?`, like the OP did. — Alan Moore, Aug 23 '16 at 01:00
Great answer! How can this be modified to include as part of the capture? — Chris Barretto, Aug 02 '17 at 18:51

Remove from in HTML [REGEX, RUBY]

1 Answers1