HTML Agility Pack cant get text content from div

Question

I am new to C# and wanted to try to make a little scraper out of it to try out some things. I saw a YT video on it. I am trying to scrape bet365.dk (more specifically this link: https://www.bet365.dk/#/AC/B1/C1/D451/F2/Q1/F^12/).

This is my code:

using System;
using System.Net.Http;
using HtmlAgilityPack;

namespace Bet365Scraper
{
    class Program
    {
        static void Main(string[] args)
        {
           GetHtmlAsync();
           Console.ReadLine();
        }

        private static async void GetHtmlAsync()
        {
            var url = "https://www.bet365.dk/#/AC/B1/C1/D451/F2/Q1/F^12/";

            var httpClient = new HttpClient();
            httpClient.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36");
            var html = await httpClient.GetStringAsync(url);

            var htmlDocument = new HtmlDocument();
            htmlDocument.LoadHtml(html);

            var htmlBody = htmlDocument.DocumentNode.SelectSingleNode("//body");
            var node = htmlBody.Element("//div[@class='src-ParticipantFixtureDetailsHigher_TeamNames ']");

            Console.WriteLine(node.InnerHtml);
        }

    }
}

I am not sure how to do this. And I find the documentation on HTML Agilty Pack's site a bit confusing, and I cannot seem to find what I exactly is looking for. Here is what I want to do. This little piece of the HTML on the bet365 site:

<div class="src-ParticipantFixtureDetailsHigher_TeamNames">
    <div class="src-ParticipantFixtureDetailsHigher_TeamWrapper ">
       <div class="src-ParticipantFixtureDetailsHigher_Team " style="">Færøerne</div>
    </div>
    <div class="src-ParticipantFixtureDetailsHigher_TeamWrapper ">
        <div class="src-ParticipantFixtureDetailsHigher_Team ">Andorra</div>
    </div>
</div>

How could I be able to print out both 'Færørne' and 'Andorra' from the divs in one go? I am aware of the fact, that I probably need to use a foreach, but as said, I'm not too certain how to do with the selectors and such.

Are you familiar with JavaScript `querySelector` or jQuery syntax? — aepot, Oct 13 '20 at 18:12

aepot · Accepted Answer · 2020-10-13T18:47:02.417

0

I'm not familiar with XPath but i know JS query syntax, and suggest to install Fizzler.Systems.HtmlAgilityPack NuGet package additionally.

Then HtmlNode.QuerySelector() method will be available. It accepts JavaScript query syntax.

Also i fixed HttpClient usage.

namespace Bet365Scraper
{
    class Program
    {
        private static readonly HttpClient httpClient = new HttpClient();

        static async Task Main(string[] args)
        {
           httpClient.DefaultRequestHeaders.UserAgent.ParseAdd("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36");
           await GetHtmlAsync("https://www.bet365.dk/#/AC/B1/C1/D451/F2/Q1/F^12/");
           Console.ReadLine();
        }

        private static async Task GetHtmlAsync(string url)
        {
            var html = await httpClient.GetStringAsync(url);

            var htmlDocument = new HtmlDocument();
            htmlDocument.LoadHtml(html);

            var nodes = htmlDocument.DocumentNode.QuerySelectorAll(".src-ParticipantFixtureDetailsHigher_Team");
            foreach (HtmlNode node in nodes)
            {
                Console.WriteLine(node.InnerText);
            }
        }
    }
}

edited Oct 13 '20 at 18:47

answered Oct 13 '20 at 18:35

aepot

4,558
2
12
24

this is what you meant right: https://stackoverflow.com/questions/40570656/scrape-an-html-page-after-ajax-calls-for-elements-not-in-the-page-source – KPsanz Oct 18 '20 at 15:25
@KPsanz yep. :) – aepot Oct 18 '20 at 15:27
I actually managed to locate the external html file in the network log, it was: https://www.bet365.dk/SportsBook.API/web?lid=7&zid=0&pd=%23AC%23B1%23C1%23D451%23F2%23Q1%23F%5E12%23&cid=54&ctid=54 However, it seems that the site has used some very very confusing format of doing things. It doesn't even look like something I've seen before. Do you know what this format used is? – KPsanz Oct 18 '20 at 15:50
@KPsanz i dont't have access to the link, take a screenshot – aepot Oct 18 '20 at 16:57
@KPsanz i don't know, what's the format but it's text. You can load it into `string` and parse. – aepot Oct 18 '20 at 18:01

HTML Agility Pack cant get text content from div

1 Answers1