0

I need to get bitcoin price from https://coinmarketcap.com/currencies/bitcoin/ using Html Agility Pack. I am using this example which works fine:

var html = @"http://html-agility-pack.net/";
HtmlWeb web = new HtmlWeb();
var htmlDoc = web.Load(html);
var node = htmlDoc.DocumentNode.SelectSingleNode("//head/title");
Console.WriteLine("Node Name: " + node.Name + "\n" + node.OuterHtml);   

The XPath is: //*[@id="__next"]/div/div[1]/div[2]/div/div[1]/div[2]/div/div[2]/div[1]/div

The HTML:

<div class="priceValue "><span>$17,162.42</span></div>

I have tried code below, but it returns "Object reference not set to an instance of an object":

var html = @"https://coinmarketcap.com/currencies/bitcoin/";
HtmlWeb web = new HtmlWeb();
var htmlDoc = web.Load(html);
var node = htmlDoc.DocumentNode.SelectSingleNode("//div[@class='priceValue']/span");
Console.WriteLine("Node Name: " + node.Name + "\n" + node.InnerText);`
Stefan Wuebbe
  • 2,109
  • 5
  • 17
  • 28
  • 1
    Does this answer your question? [What is a NullReferenceException, and how do I fix it?](https://stackoverflow.com/questions/4660142/what-is-a-nullreferenceexception-and-how-do-i-fix-it) –  Dec 10 '22 at 10:56
  • 1
    Side note: It is important to remember that HtmlAgilityPack is **NOT** like a web browser. It does **NOT** execute javascript or wasm. Therefore, it might not be really useful to use a real web browser (like Firefox or Chrome) to inspect the page structure you want to process, as you can't tell whether the page structure you see there in the web browser inspector has been dynamically created/altered by some javascript/wasm script. (In other words, the HTML structure you see in a web browser inspector might not be the actual orignal source html as provided by the web server.) –  Dec 10 '22 at 11:02
  • While this wouldn't use `HtmlWeb()`, wouldn't a more preferred solution be to use [the API](https://coinmarketcap.com/api/) provided by the website in question? – Juris Dec 10 '22 at 17:16

1 Answers1

0

TLDR:

  1. You need to tell HtmlWeb to decompress the response (or use a proper HTTP client)
  2. You need to fix the XPath selector

Obviously the SelectSingleNode() call returns null because it can't find the node.

In cases like this it is helpful to inspect the loaded HTML. You can do this by getting the value of htmlDoc.DocumentNode.InnerHtml. I tried this and the "HTML" produced was gibberish.

The reason is that HtmlWeb doesn't decompress by default the response it gets. See this github issue for details. I don't think you would have this problem if you used a proper HTTP client (like this one), or if the HtmlAgilityPack devs were more proactive.

If you insist on using HtmlWeb, your code should look like this:

const string html = @"https://coinmarketcap.com/currencies/bitcoin/";
        
var web = new HtmlWeb
{
    AutomaticDecompression = DecompressionMethods.GZip
};
HtmlDocument doc = web.Load(html);

HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[@class='priceValue ']/span");

Notice that the class for the element you are looking for is actually priceValue (with a space character in the end) and there is another div with a class of priceValue in the page. This is another issue however and you should eventually be able to find a more robust selector. Maybe something like this:

HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[contains(@class, 'priceSection')]//div[contains(@class, 'priceValue')]/span");
Asotos
  • 995
  • 11
  • 14