0

I am using C# to get an html page a whole however like to isolate just one div specifies

<div class="row row-dia-obituario">

I'm using this code to get the html, it brings the full html of the page

request = (HttpWebRequest)WebRequest.Create("https://pt.wikipedia.org/wiki/Wikip%C3%A9dia:P%C3%A1gina_principal");
request.Proxy = webProxy;
request.Timeout = 20000;
request.Method = "GET";
request.KeepAlive = true;
response = (HttpWebResponse)request.GetResponse();
sr = new StreamReader(response.GetResponseStream(), encoding);
html = sr.ReadToEnd();
string htmlaux = Regex.Replace(html, "&quot;", "").Trim();
html = System.Net.WebUtility.HtmlDecode(htmlaux);
SᴇM
  • 7,024
  • 3
  • 24
  • 41
  • What do you mean "catch a specific html class with c#"? – SᴇM Nov 01 '16 at 12:47
  • It is very strange that researching this topic lead you to using [regex](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/)... You may want to search again and look for answers using HtmlAgilityPAck or any other real parser. – Alexei Levenkov Nov 01 '16 at 12:48
  • Of course it brings the full HTML content, you aren't even filtering. By the way, regex is definitely [not the way to go](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Matias Cicero Nov 01 '16 at 12:50

1 Answers1

1

Don't use Regex to parse html. Use Html parser, you can look into Html Agility Pack

    HtmlDocument doc = new HtmlDocument();
    doc.LoadHtml(html);

    var divNode = doc.DocumentNode.Descendants().Where(x => x.Name == "div" && 
                                                x.Attributes["class"].Value == "row row-dia-obituario")
                                               .FirstOrDefault();
mybirthname
  • 17,949
  • 3
  • 31
  • 55