0

I'm using the Html Agility Pack in Windows Form with C # and I get good results in html page searching.

However, the query returns the whole html of the page and I only need the contents of the post, since the rest are unnecessary links and texts.

The content that matters after reading the html is between:
<span class = "update-date"> 23/06/2019 16h17 '<' / span '>' <'/ span'> '' <'/ p'> '

and '<' p class = "col-lg-24" '>'.

I tried to use regex, but I did not succeed.

I am using the wrong .SelectNodes for this case?

Here's an example: (Example based on https://dotnetfiddle.net/ltDevV)

// @nuget: HtmlAgilityPack 
using System; 
using System.Xml; 
using HtmlAgilityPack;

public class Program
{
    public static void Main()
    { 
        var html = 
    @"https://economia.uol.com.br/noticias/redacao/2019/06/23/aposentadoria-pensao-camara-deputados.htm";

        HtmlWeb web = new HtmlWeb();

        var htmlDoc = web.Load(html);

        var htmlNodes = htmlDoc.DocumentNode.SelectNodes("//p");

        if(htmlNodes!=null)
        {

            foreach (var node in htmlNodes)
            {               
                Console.WriteLine(node.OuterHtml);              
            }
        }
        else
        {
            Console.WriteLine("Oh OK.");    
        }
    }
}

I hope to be able to get as a final result only the content that is between the tags
<span class = "update-date"> 23/06/2019 16:17 '<' / span '>' <'/ span'> '' < '/ p'> 'e' <'p class = "col-lg-24"'> '.

Nanhydrin
  • 4,332
  • 2
  • 38
  • 51
Mike
  • 31
  • 3
  • Possible duplicate of [RegEx match open tags except XHTML self-contained tags](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Alessandro Da Rugna Jun 24 '19 at 11:56
  • could you provide the resulting html you want to parse with the regex? This question might not just concern C#-ers. If you provide the resulting html, someone could solve the regex for you. – Smytt Jun 24 '19 at 12:11
  • Is not duplicate... thanks – Mike Jun 24 '19 at 12:24
  • This is a example of code regex, but if possible, i prefer to resolve with html-agility-pack. var Msgfilter = MensagemPost; var regex = new System.Text.RegularExpressions.Regex("Begin(.*?)>(.*?)End"); var m = regex.Match(Msgfilter); Console.Write(m.Groups[2].Value); // will print -> World var FinalMsg = m.Groups[2].Value; – Mike Jun 24 '19 at 12:25

0 Answers0