I'm using the Html Agility Pack in Windows Form with C # and I get good results in html page searching.
However, the query returns the whole html of the page and I only need the contents of the post, since the rest are unnecessary links and texts.
The content that matters after reading the html is between:
<span class = "update-date"> 23/06/2019 16h17 '<' / span '>' <'/ span'> '' <'/ p'> '
and '<' p class = "col-lg-24" '>'.
I tried to use regex, but I did not succeed.
I am using the wrong .SelectNodes for this case?
Here's an example: (Example based on https://dotnetfiddle.net/ltDevV)
// @nuget: HtmlAgilityPack
using System;
using System.Xml;
using HtmlAgilityPack;
public class Program
{
public static void Main()
{
var html =
@"https://economia.uol.com.br/noticias/redacao/2019/06/23/aposentadoria-pensao-camara-deputados.htm";
HtmlWeb web = new HtmlWeb();
var htmlDoc = web.Load(html);
var htmlNodes = htmlDoc.DocumentNode.SelectNodes("//p");
if(htmlNodes!=null)
{
foreach (var node in htmlNodes)
{
Console.WriteLine(node.OuterHtml);
}
}
else
{
Console.WriteLine("Oh OK.");
}
}
}
I hope to be able to get as a final result only the content that is between the tags
<span class = "update-date"> 23/06/2019 16:17 '<' / span '>' <'/ span'> '' < '/ p'> 'e' <'p class = "col-lg-24"'> '
.