What is wrong with my regular expression?

Question

How can i go about getting the value eg

<div class="detail"> Hello </div>

<div class="detail"> World </div>

     string x = " <div class="results-list clearfix">
                 <div class="detail">   Hello
                 </div> 
       </div>
       <div class="results-list clearfix">
                 <div class="detail">   World
                 </div> 
       </div>          
     ";

    String pattern = @"<div class=""results-list clearfix"">(?<Content>[^<]*)</div>";

    Regex rx = new Regex(pattern,RegexOptions.Multiline);
    Match m = rx.Match(x);

    while (m.Success)
    {
        string zz =  m.Groups["Content"].Value;
        m = m.NextMatch();
    }

Your `string x` value is not valid C# - you need to use a verbatim string literal (start with `@`) and escape the inner quotes `"`. — Oded, Feb 08 '11 at 13:37
Have a look at this thread. http://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not — Mia Clarke, Feb 08 '11 at 13:46
@Banang: Wrong. Take a look at [this thread](http://stackoverflow.com/questions/4933611/can-extended-regex-implementations-parse-html/4934590#4934590). — tchrist, Feb 08 '11 at 15:00
@tchrist I do not agree that my comment is wrong at all. I linked to a thread that explains why parsing html using regular expressions is a *bad idea*. You linked to an answer written by you where you say that it is a bad idea, but that it *can be done*. These are not conflicting ideas. — Mia Clarke, Feb 08 '11 at 15:18
@Banang: Your linked thread mistakenly asserts *Entire HTML parsing is not possible with regular expressions, since it depends on matching the opening and the closing tag which is not possible with regexps.* That is not in the least bit true with modern patterns, as the very last line of my own cited answer trivially proves by using `s/\\((?:[^()]*+|(?0))*\\)//g` to delete all opening and closing parens and their contents, **recursively**. It is therefore no longer a theoretical matter, merely a practical one: the theory allows it while the practice often advises against it. — tchrist, Feb 08 '11 at 15:27
@Banang: Furthermore, even if it *were* true (and it’s not), it would not apply because it is talking about the parsing of “entire HTML”, which is nothing like the case here. — tchrist, Feb 08 '11 at 15:31
@tchrist I will not argue with you over this. I find your tone utterly unpleasant and argumentative. If you care so deeply about this issue that you feel the need to be verbose, there is nothing I can say to make you feel any differently. — Mia Clarke, Feb 08 '11 at 15:37
@Banang - While tchrist may be a bit zealous sometimes, he is definitely correct. Today the relation between regular expressions and regular languages is mostly rhythmic (i.e.: they sound similar). A short example is `(.*)\1`, which is supported by most flavors and clearly isn't regular, and and other extensions like recursive matching make them even more powerful. The idea that you cannon match a construct because it isn't *regular* is often stated, but is simply wrong. — Kobi, Feb 09 '11 at 05:21
@Kobi, thank you for that explanation. tchrist could definitly learn from your communication skills. — Mia Clarke, Feb 09 '11 at 08:17

score 1 · Answer 1 · answered Feb 08 '11 at 13:36

1

I think this is your problem ""results-list clearfix"". As you are using a literal string, you can remove the extra "'s.

answered Feb 08 '11 at 13:36

Neil Knight

47,437
25
129
188

score 0 · Answer 2 · answered Feb 08 '11 at 13:44

0

It is a bad idea to use regular expressions for this kind of parsing. Use an XML parser for this particular scenario. I suggest LINQ to XML, i.e. XElement.Parse(...)

Do not forget to wrap you html in a single root element though.

answered Feb 08 '11 at 13:44

Manu

28,753
28
75
83

**This is not true!** It is *perfectly reasonable* to use regexes on small bits of captive ᴀᴋᴀ tame X/HTML like this. It’s far more reasonable that the alternative, which has about a 10,000:1 blowup and is fiddly to boot. This is the perfect situation for applying regexes to X/HTML. – tchrist Feb 08 '11 at 14:15

score 0 · Answer 3 · answered Feb 08 '11 at 14:57

0

Try this pattern with SingleLine option:

string pattern = "<div\\sclass=\"results-list clearfix\">\\s*(?<Content><div[^>]*>.*?</div>)"

answered Feb 08 '11 at 14:57

marzouka

384
2
7

What is wrong with my regular expression?

3 Answers3