how to catch a specific html class with C #

Question

I am using C# to get an html page a whole however like to isolate just one div specifies

<div class="row row-dia-obituario">

I'm using this code to get the html, it brings the full html of the page

request = (HttpWebRequest)WebRequest.Create("https://pt.wikipedia.org/wiki/Wikip%C3%A9dia:P%C3%A1gina_principal");
request.Proxy = webProxy;
request.Timeout = 20000;
request.Method = "GET";
request.KeepAlive = true;
response = (HttpWebResponse)request.GetResponse();
sr = new StreamReader(response.GetResponseStream(), encoding);
html = sr.ReadToEnd();
string htmlaux = Regex.Replace(html, "&quot;", "").Trim();
html = System.Net.WebUtility.HtmlDecode(htmlaux);

It is very strange that researching this topic lead you to using [regex](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/)... You may want to search again and look for answers using HtmlAgilityPAck or any other real parser. — Alexei Levenkov, Nov 01 '16 at 12:48
Of course it brings the full HTML content, you aren't even filtering. By the way, regex is definitely [not the way to go](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) — Matias Cicero, Nov 01 '16 at 12:50

score 1 · Accepted Answer · answered Nov 01 '16 at 12:50

Don't use Regex to parse html. Use Html parser, you can look into Html Agility Pack

    HtmlDocument doc = new HtmlDocument();
    doc.LoadHtml(html);

    var divNode = doc.DocumentNode.Descendants().Where(x => x.Name == "div" && 
                                                x.Attributes["class"].Value == "row row-dia-obituario")
                                               .FirstOrDefault();

how to catch a specific html class with C #

1 Answers1