HtmlAgilityPack NullReferenceException Error

Question

I'm trying to get text but I'm always getting NullReferenceException error. I'm getting crazy. It worked for a moment but I've started to get this error again. I'll just get post title's.

I've tried to change my XPath. I tried different XPath's.

    private void button1_Click(object sender, EventArgs e)
    {
        listView1.Items.Clear();

        for (int i = 4; i < 9; i++)
        {
            VeriAl(Url: "https://cracked.to/Forum-Combolists?sortby=started&order=desc&datecut=9999&prefix=0", XPath: "//table[@class='tborder clear']//tr[" + i + "]//td[2]//div[1]//span[1]//span[1]//a", tag: "title",CikanSonuc: listView1);
        }
    }

    public void VeriAl (String Url, String XPath, String tag,ListView CikanSonuc)
    {
        try
        {
            url = new Uri(Url);
        }
        catch (UriFormatException)
        {
            if (MessageBox.Show(text: "UriFormatException", caption: "Hata", buttons: MessageBoxButtons.OK, icon: MessageBoxIcon.Error) == DialogResult.OK)
            {

            }
        }
        catch (ArgumentNullException)
        {
            if (MessageBox.Show(text: "ArgumentNullException", caption: "Hata", buttons: MessageBoxButtons.OK, icon: MessageBoxIcon.Error) == DialogResult.OK)
            {

            }
        }

        WebClient client = new WebClient();
        try
        {
            html = client.DownloadString(url);
        }
        catch (WebException)
        {
            if (MessageBox.Show(text: "WebException", caption: "Hata", buttons: MessageBoxButtons.OK, icon: MessageBoxIcon.Error) == DialogResult.OK)
            {

            }
        }

        HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
        doc.LoadHtml(html);
        try
        {
            CikanSonuc.Items.Add(doc.DocumentNode.SelectSingleNode(XPath).Attributes[tag].Value);
        }
        catch (NullReferenceException)
        {
            if (MessageBox.Show(text: "NullReferenceException", caption: "Hata", buttons: MessageBoxButtons.OK, icon: MessageBoxIcon.Error) == DialogResult.OK)
            {

            }
        }

Have you pulled out the command that gets the value and ensured it's actually getting a value? — Ben, Jan 07 '19 at 21:54
Please update your question to include the contents of `html`. Also please split `CikanSonuc.Items.Add(doc.DocumentNode.SelectSingleNode(XPath).Attributes[tag].Value)` over multiple lines of code (each with `;` at the end of the line) in your question, so that each operation has only a single `.` in it. Then tell us which of the lines throws the exception. — mjwills, Jan 07 '19 at 22:22
The URL you are trying to hit performs a client-side redirect to itself after a few seconds of showing a loading message. The table you are trying to access is actually not there in the downloaded html as the downloaded html is of the loading page. — Piyush Khanna, Jan 07 '19 at 22:25
it is not going to be straight-forward to reach the page you want to. — Piyush Khanna, Jan 07 '19 at 22:33

score 1 · Answer 1 · answered Jan 07 '19 at 22:53

The problem is that the content doesnt exist at the time you are trying to get the HTML. The Javascript code that loads the content doesnt get executed if you call WebClient.DownloadString(). The only way to get a fully loaded web page is to load the page with a WebBrowser control and get all the content after it has finished loading:

public static class WebViewExtension
{
    public static HtmlAgilityPack.HtmlDocument GetHtmlDocument(this WebBrowser wView)
    {
        HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
        doc.LoadHtml(wView.Document.Body.OuterHtml);
        return doc;
    }

    public static async Task<HtmlAgilityPack.HtmlDocument> LoadSiteAndGetHtml(this WebBrowser wView, string siteurl)
    {
        await wView.NavigateAndWait(siteurl);
        HtmlAgilityPack.HtmlDocument doc = wView.GetHtmlDocument();
        return doc;
    }

    public static async Task NavigateAndWait(this WebBrowser wView, string siteurl)
    {
        TaskCompletionSource<bool> loaded = new TaskCompletionSource<bool>();
        wView.Navigate(new Uri(siteurl));
        wView.DocumentCompleted += delegate (object sender, WebBrowserDocumentCompletedEventArgs args)
        {
            loaded?.TrySetResult(true);
        };

        //wait until the website is loaded
        await loaded.Task;
    }
}

You can use these Methods like this:

    WebBrowser client = new WebBrowser();
    try
    {
        HtmlDocument doc = await client.LoadSiteAndGetHtml(url);
    }
    catch (WebException)
    {
        if (MessageBox.Show(text: "WebException", caption: "Hata", buttons: MessageBoxButtons.OK, icon: MessageBoxIcon.Error) == DialogResult.OK)
        {

        }
    }

score 0 · Answer 2 · edited Dec 04 '21 at 17:14

The URL - https://cracked.to/Forum-Combolists?sortby=started&order=desc&datecut=9999&prefix=0 displays a loading screen for a few seconds and redirects to itself showing you the content you want to see. Before redirect, It also sets some browser cookies so that you don't see the loading screen again and rather see the content. The redirect is client-side using a script.

When in c#, you are trying to download the page, it is actually downloading the source of that loading page, rather than the actual content. Your downloaded HTML doesn't even have the elements you are trying to access, hence the null reference exception.

Solving this is going to be very tricky as this is a client-side redirect. You might even end up with scraping the webpage using a browser instance, in order to retrieve html after redirection. Probably something like this Scraping webpage generated by JavaScript with C#.

HtmlAgilityPack NullReferenceException Error

2 Answers2