1

My code:

public static (HtmlNodeCollection title, HtmlNodeCollection price) ParsingNodesTP()
{
    HtmlWeb web = new HtmlWeb();
    HtmlDocument doc = web.Load("https://rozetka.com.ua/ua/search/?text=Asus+Zenbook+14&producer=asus&page=1");
    var titles = doc.DocumentNode.SelectNodes("//a[@class='goods-tile__heading ng-star-inserted']//span");
    var price = doc.DocumentNode.SelectNodes("//div[@class='goods-tile__prices']//div[@class='goods-tile__price price--red ng-star-inserted']//p//span[@class='goods-tile__price-value']");
    return (titles, price);
}

The error I got:

System.NullReferenceException: 'Object reference not set to an instance of an object.'

Where is the problem?

Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
  • 1
    this website is protected by cloudflare, so the html loaded in `doc` object is the html of cloudflare DOS protection page which doesn't include the html tags you are looking for, please check this question it may help you: https://stackoverflow.com/questions/32425973/how-can-i-get-html-from-page-with-cloudflare-ddos-portection – Mahmoud Farahat Aug 07 '22 at 08:54
  • Please debug your code, make sure that doc object fields are filled. If doc object is null, you may not access the DocumentNode. Besides, if you cannot load the webpage, you may not found specified nodes as well. – ahmet gül Aug 08 '22 at 13:53
  • doc isn't null @ahmetgül – Олександр Павлюк Aug 10 '22 at 09:25

1 Answers1

0

F12 is your friend in any browser. Look at the network tab. The data you are interested in is of type "xhr". No need to use HTMLAgilityPack. All you need to do is parse the Json returned by your url.

  1. Write code to download the json string from the URL. The following code works for me :

        using System.Net;
    
        using (WebClient wc = new WebClient())
        {             
            wc.Headers.Add("accept", "application/json, text/plain, */*");          
            wc.Headers.Add("user-agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36");              
    
            string downloadedJson = wc.DownloadString("https://search.rozetka.com.ua/ua/search/api/v6/?front-type=xl&country=UA&lang=ua&producer=asus&page=1&text=Asus+Zenbook+14");                 
        }
    
  2. Copy the whole result (the value of downloadedJson) on your clipboard.

  3. In Visual Studio create a new class file

  4. Click Edit > Paste special > Paste Json as classes. In your code you will need the name of the first class that you pasted. It is the parent class called Rootobject by default

  5. Install Newtonsoft.Json

 using Newtonsoft.Json
 Rootobject obj = JsonConvert.DeserializeObject < Rootobject>(downloadedJson);

Now you can loop through the Data array to extract all of the job info you need.

        Good[] goods= rootobject.data.goods;
Ole EH Dufour
  • 2,968
  • 4
  • 23
  • 48