2

How can i go about getting the value eg

  1. <div class="detail"> Hello </div>
  2. <div class="detail"> World </div>

         string x = " <div class="results-list clearfix">
                     <div class="detail">   Hello
                     </div> 
           </div>
           <div class="results-list clearfix">
                     <div class="detail">   World
                     </div> 
           </div>          
         ";
    
        String pattern = @"<div class=""results-list clearfix"">(?<Content>[^<]*)</div>";
    
        Regex rx = new Regex(pattern,RegexOptions.Multiline);
        Match m = rx.Match(x);
    
        while (m.Success)
        {
            string zz =  m.Groups["Content"].Value;
            m = m.NextMatch();
        } 
    
Joachim Sauer
  • 302,674
  • 57
  • 556
  • 614
beginner
  • 21
  • 1
  • 2
    Your `string x` value is not valid C# - you need to use a verbatim string literal (start with `@`) and escape the inner quotes `"`. – Oded Feb 08 '11 at 13:37
  • Have a look at this thread. http://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not – Mia Clarke Feb 08 '11 at 13:46
  • @Banang: Wrong. Take a look at [this thread](http://stackoverflow.com/questions/4933611/can-extended-regex-implementations-parse-html/4934590#4934590). – tchrist Feb 08 '11 at 15:00
  • @tchrist I do not agree that my comment is wrong at all. I linked to a thread that explains why parsing html using regular expressions is a *bad idea*. You linked to an answer written by you where you say that it is a bad idea, but that it *can be done*. These are not conflicting ideas. – Mia Clarke Feb 08 '11 at 15:18
  • 1
    @Banang: Your linked thread mistakenly asserts *Entire HTML parsing is not possible with regular expressions, since it depends on matching the opening and the closing tag which is not possible with regexps.* That is not in the least bit true with modern patterns, as the very last line of my own cited answer trivially proves by using `s/\\((?:[^()]*+|(?0))*\\)//g` to delete all opening and closing parens and their contents, **recursively**. It is therefore no longer a theoretical matter, merely a practical one: the theory allows it while the practice often advises against it. – tchrist Feb 08 '11 at 15:27
  • @Banang: Furthermore, even if it *were* true (and it’s not), it would not apply because it is talking about the parsing of “entire HTML”, which is nothing like the case here. – tchrist Feb 08 '11 at 15:31
  • 1
    @tchrist I will not argue with you over this. I find your tone utterly unpleasant and argumentative. If you care so deeply about this issue that you feel the need to be verbose, there is nothing I can say to make you feel any differently. – Mia Clarke Feb 08 '11 at 15:37
  • 1
    @Banang - While tchrist may be a bit zealous sometimes, he is definitely correct. Today the relation between regular expressions and regular languages is mostly rhythmic (i.e.: they sound similar). A short example is `(.*)\1`, which is supported by most flavors and clearly isn't regular, and and other extensions like recursive matching make them even more powerful. The idea that you cannon match a construct because it isn't *regular* is often stated, but is simply wrong. – Kobi Feb 09 '11 at 05:21
  • @Kobi, thank you for that explanation. tchrist could definitly learn from your communication skills. – Mia Clarke Feb 09 '11 at 08:17

3 Answers3

1

I think this is your problem ""results-list clearfix"". As you are using a literal string, you can remove the extra "'s.

Neil Knight
  • 47,437
  • 25
  • 129
  • 188
0

It is a bad idea to use regular expressions for this kind of parsing. Use an XML parser for this particular scenario. I suggest LINQ to XML, i.e. XElement.Parse(...)

Do not forget to wrap you html in a single root element though.

Manu
  • 28,753
  • 28
  • 75
  • 83
  • **This is not true!** It is *perfectly reasonable* to use regexes on small bits of captive ᴀᴋᴀ tame X/HTML like this. It’s far more reasonable that the alternative, which has about a 10,000:1 blowup and is fiddly to boot. This is the perfect situation for applying regexes to X/HTML. – tchrist Feb 08 '11 at 14:15
0

Try this pattern with SingleLine option:

string pattern = "<div\\sclass=\"results-list clearfix\">\\s*(?<Content><div[^>]*>.*?</div>)"
marzouka
  • 384
  • 2
  • 7