Parsing innertext of html

Question

This is part of html that i am parsing

<li><a href="http://some.link.com/4DFR6DJ43Y/sessionid?ticket=ASDSIDFK32423421" target="_blank">http://some.link.com/4DFR6DJ43Y/sessionid?ticket=ASDSIDFK32423421</a></li>

I want to get http://some.link.com/4DFR6DJ43Y/sessionid?ticket=ASDSIDFK32423421 as an output.

So far i have tried

        HtmlDocument document = new HtmlDocument();
        document.LoadHtml(responseFromServer);


        var link = document.DocumentNode.SelectSingleNode("//a");

        if (link != null)
        {  
            if(link.innerText.Contains("ticket"))
            {
                Console.WriteLine(link.InnerText);
            }
        }

... but output is null (no inner texts are found).

use link.innerText.Contains – Mahesh Malpani Feb 17 '16 at 06:28 — Mahesh Malpani, Feb 17 '16 at 06:28

har07 · Accepted Answer · 2016-02-17T06:37:00.667

That's probably because the first link in your HTML document as returned by SelectSingleNode(), doesn't contains text "ticket". You can check for the target text in XPath directly , like so :

var link = document.DocumentNode.SelectSingleNode("//a[contains(.,'ticket')]");

if (link != null)
{
    Console.WriteLine(link.InnerText);
}

or using LINQ style if you like :

var link = document.DocumentNode
                   .SelectNodes("//a")
                   .OfType<HtmlNode>()
                   .FirstOrDefault(o => o.InnerText.Contains("ticket"));

if (link != null)
{
    Console.WriteLine(link.InnerText);
}

Xpath solution is exactly what i need. Short and sweet. Thank you. — Tagyoureit, Feb 17 '16 at 06:46

score 1 · Answer 2 · answered Feb 17 '16 at 06:29

1

You provided a piece of code that won't compile because innerText is not defined. If you try this code, you'll probably get what you asked for:

HtmlDocument document = new HtmlDocument();
document.LoadHtml(html);

var link = document.DocumentNode.SelectSingleNode("//a");

if (link != null)
{
    if(link.InnerText.Contains("ticket"))
    {
        Console.WriteLine(link.InnerText);
    }
}

answered Feb 17 '16 at 06:29

Serhiy Chupryk

438
2
5
15

Thank you its a typo when copy/pasting. This returns NULL tho. I have settled on Xpath solution posted above. Thank you for your time. – Tagyoureit Feb 17 '16 at 06:47

score 0 · Answer 3 · edited May 23 '17 at 11:59

0

You can use HTML Agility Pack instead of HTML Document then you can do deep parsing in HTML. for more information please see the following information. See the following link. How to use HTML Agility pack

edited May 23 '17 at 11:59

Community

1
1

answered Feb 17 '16 at 06:19

Jaswinder Singh

53
1
7

1

HtmlDocument is from Html Agility Pack – Tyress Feb 17 '16 at 06:32

Parsing innertext of html

3 Answers3