1

I am trying to get a table from the web page https://www.belastingdienst.nl/rekenhulpen/wisselkoersen/ using HtmlAgilityPack.

My code so far is

WebClient webClient = new WebClient();
        string page = webClient.DownloadString("https://www.belastingdienst.nl/rekenhulpen/wisselkoersen/");

        HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
        doc.LoadHtml(page);

        List<List<string>> table = doc.DocumentNode.SelectSingleNode("//table[@class='list_result Result']")
                    .Descendants("tr")
                    .Skip(1)
                    .Where(tr => tr.Elements("td").Count() > 1)
                    .Select(tr => tr.Elements("td").Select(td => td.InnerText.Trim()).ToList())
                    .ToList();

My problem is that the webpage creates the table by using JavaScript and when I try to read it it throws a null exception because the web page is showing that I must enable JavaScript.

I also tried to use "GET" method

 string Url = "https://www.belastingdienst.nl/rekenhulpen/wisselkoersen/";
HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create(Url);
myRequest.Method = "GET";
            WebResponse myResponse = myRequest.GetResponse();
StreamReader sr = new StreamReader(myResponse.GetResponseStream(), System.Text.Encoding.UTF8);
string result = sr.ReadToEnd();
sr.Close();
            myResponse.Close();

with the same results. I already enable JavaScript in Internet Explorer and change registry as well

if (Environment.Is64BitOperatingSystem)
        Regkey = Microsoft.Win32.Registry.LocalMachine.OpenSubKey(@"SOFTWARE\\Wow6432Node\\Microsoft\\Internet Explorer\\MAIN\\FeatureControl\\FEATURE_BROWSER_EMULATION", true);
    else  //For 32 bit machine
        Regkey = Microsoft.Win32.Registry.LocalMachine.OpenSubKey(@"SOFTWARE\\Microsoft\\Internet Explorer\\Main\\FeatureControl\\FEATURE_BROWSER_EMULATION", true);

If I use a WebBrowser component I can see the web page without problem but I still can't get the table to list.

Heretic Monkey
  • 11,687
  • 7
  • 53
  • 122
rippergr
  • 182
  • 2
  • 20

2 Answers2

3

F12 is your friend in any browser.

Select the Network tab and you'll notice that all of the info is in this file :

https://www.belastingdienst.nl/data/douane_wisselkoersen/wks.douane.wisselkoersen.dd201806.xml

(I suppose that the data for july 2018 will be held in a url named *.dd201807.xml)

Using C# you will need to do a GET for that URL and parse it as XML, no need to use HtmlAgilityPack. You will need to construct the current year concatenated with the current month to pick the right URL.

Leuker kan ik het niet maken!

Ole EH Dufour
  • 2,968
  • 4
  • 23
  • 48
  • Great that is exactly what I was looking for and maybe better. I was searching for the table but I couldn't find any code. That tip with the network tab is great. – rippergr Jun 24 '18 at 18:32
  • I am having same problem for the following url I need to get table data that is in div packageTabContainer which is created by javascript. could you plz suggest? https://www.ikea.com/qa/en/catalog/products/60368726/ – Khan Engineer Jun 27 '18 at 15:49
  • @KhanEngineer I don't see a div with that ID. I'd recommend to ask a question. – Ole EH Dufour Jun 27 '18 at 17:27
  • Packages: 1
    Article NumberPackagesWidthHeightLengthDiameterWeight
    60368726120 cm3 cm27 cm-0.06 kg
    here it is
    – Khan Engineer Jun 28 '18 at 06:20
  • this div is inside div having id packageInfo – Khan Engineer Jun 28 '18 at 06:21
  • @KhanEngineer I had a look with F12, but I can't find any external file holding that info. Can you not fetch it using HtmlAgilityPack ? – Ole EH Dufour Jun 28 '18 at 11:15
  • the info isn't inside external file it is in the same doc page inside div having id mainpadding which has JSobject jProductData, search it you would find this easily. – Khan Engineer Jun 28 '18 at 15:17
  • @Ole EH Dufour did you check the link? – Khan Engineer Jul 06 '18 at 16:45
  • To see data go to Network tab and filter data by XHR to see objects are used to interact with servers. – TJacken Jul 31 '20 at 08:36
0

WebClient is an http client, not a web browser, so it won't execute JavaScript. What is need is a headless web browser. See this page for a list of headless web browsers. I have not tried any of them though, so I cannot give you a recommendation here:

Headless browser for C# (.NET)?

Jean-François Beauchamp
  • 5,485
  • 8
  • 43
  • 77