Regular expression to not match a string in C#

Question

I have some HTML that I need to parse (in a large document) as text, and the portion I'm interested in looks like this:

...
<div id="whatever" class="whatever whatever">some title with <em>html</em> and other such tags in it, but never a div tag</div>
...

Now I want to get out of it the text within the DIV with the HTML. Here's what I have for the Regular expression (using groups):

<div id=\"whatever\" class=\"whatever whatever\">(?<title>[^</div>]*?)</div>

So the idea there is that I'll match the whole thing, and get a group with all the text up to the point where the < /div > occurs (as there's no other identifying factor for the end of the string).

The ^ in [] doesn't work because it's "any" of those characters, not the string "< /div >" that I want. Any ideas how I make this work?

Just don't do it. Use an HTML parser such as HtmlAgilityPack instead. Duplicate: [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) — spender, Jun 11 '12 at 00:14

score 0 · Answer 1 · answered Jun 11 '12 at 00:28

0

Match m=Regex.Match(s,"\\<div id=\"whatever\" class=\"whatever whatever\">(.*?)\\<\\/div\\>");                                                       
Console.WriteLine(m.Groups[1].Value);

answered Jun 11 '12 at 00:28

Eugen Rieck

64,175
10
70
92

Regular expression to not match a string in C#

1 Answers1