How to write a Multi-line RegEx Expression

Question

I have a vb.net class that cleans some html before emailing the results.

Here is a sample of some html I need to remove:

    <div class="RemoveThis">
      Blah blah blah<br /> 
      Blah blah blah<br /> 
      Blah blah blah<br /> 
      <br /> 
    </div>

I am already using RegEx to do most of my work now. What would the RegEx expression look like to replace the block above with nothing?

I tried the following, but something is wrong:

'html has all of my text
html = Regex.Replace(html, "<div.*?class=""RemoveThis"">.*?</div>", "", RegexOptions.IgnoreCase)

Thanks.

score 4 · Accepted Answer · edited May 23 '17 at 12:08

4

Add the Singleline option:

html = Regex.Replace(html, "<div.*?class=""RemoveThis"">.*?</div>", "", RegexOptions.IgnoreCase Or RegexOptions.Singleline)

From MSDN:

Singleline: Specifies single-line mode. Changes the meaning of the dot (.) so it matches every character (instead of every character except \n).

PS: Parsing HTML with regular expressions is discouraged. Your code will fail on something like this:

<div class="RemoveMe">
    <div>bla</div>
    <div>bla</div>
</div>

edited May 23 '17 at 12:08

Community

answered Jan 12 '10 at 14:25

Heinzi

Thanks, but it is not working. The things I want to remove only have text and
in them. – Bobby Ortiz Jan 12 '10 at 14:35
Are you sure? I tried it here: http://regexlib.com/RETester.aspx and it seems to work fine... – Heinzi Jan 12 '10 at 14:52
:( Yes. I am sure. I think there is something different about the .NET version. Or, it could be that my html string has alot more text. 5K atleast. – Bobby Ortiz Jan 12 '10 at 15:06
Strange. Unfortunately, I don't have Visual Studio available right now (I'm on a Linux machine at university), but I'll test it at home tonight (CET), unless someone else finds the solution earlier. – Heinzi Jan 12 '10 at 15:09

Mark Byers · Answer 2 · 2010-01-13T08:39:13.927

3

Try:

RegexOptions.IgnoreCase Or RegexOptions.Singleline

The RegexOptions.Singleline option changes the meaning of the dot from 'match anything except new line' to 'match anything'.

Also, you should consider using an HTML parser instead of regular expressions if need to parse HTML.

edited Jan 13 '10 at 08:39

answered Jan 12 '10 at 14:26

Mark Byers

2 Answers2