1

I am trying to return a sitemap in an XML format.

it seems to work but is only returning about 2/3 of the actual XML

            using (var client = new WebClient())
            {                
                string html = client.DownloadString("https://int.lowrance.com/en-au/sitemap.xml");               
                var doc = new HtmlDocument();
                doc.LoadHtml(html);

                var ach = doc.DocumentNode.SelectNodes("//loc");

                foreach (var node in ach)
                {
                    var loc = doc.DocumentNode.SelectNodes("//loc");
                    var inner = node.InnerHtml;    
                } 
            }

I have also tried

            XmlDocument doc1 = new XmlDocument();                
            doc1.Load("https://www.lowrance.com/en-au/sitemap.xml");

but when debugging I get the same results.

at first I thought this was working then realized it was not returning all the XML any help would be much appreciate.

  • If it's XML, why are you loading it into an `HtmlDocument`? Might as well use `XElement` and get validation to boot. The nodes you're looking for live in the `http://www.sitemaps.org/schemas/sitemap/0.9` namespace, which is why you won't find them with a naive `//loc` (well, in XML; I don't know how your `HtmlDocument` implementation will behave). With `XElement`, namespace management is easy, but if you don't want to be bothered you can use `//*[local-name()='loc']` and ignore the namespace entirely. – Jeroen Mostert Jan 24 '19 at 14:03
  • Why the `HtmlDocument` usage? I would choose XML related classes instead – Cleptus Jan 24 '19 at 14:03
  • Do you mean you only get 2/3 nodes you expect or is the returned string missing 1/3? – PaulF Jan 24 '19 at 14:03
  • @PaulF yes It is only returning 2/3 and missing 1/3. I did try using XmlDocument but got the same results. I have updated the question. – Mark Moonie Griffiths Jan 24 '19 at 14:16
  • You may want to specify how you're measuring "2/3" and "1/3". You can do some quick verifying with PowerShell: `(Invoke-RestMethod https://int.lowrance.com/en-au/sitemap.xml).urlset.url.loc.Count` gives me 77 `loc`s. Do you get less in your code? Is it possible the result is actually dynamic based on whether you're logged in, cookies, browser agent or something like that? – Jeroen Mostert Jan 24 '19 at 14:22
  • @JeroenMostert Hi I am just loading the page in a browser and then doing a find to see where the last returned "loc" from debugging is in the file. – Mark Moonie Griffiths Jan 24 '19 at 14:28
  • Possibly timing out before the full string is returned - see this : https://stackoverflow.com/questions/1789627/how-to-change-the-timeout-on-a-net-webclient-object – PaulF Jan 24 '19 at 14:29
  • @PaulF its not timing out as it returns after about 1 or 2 seconds. – Mark Moonie Griffiths Jan 24 '19 at 14:37
  • I get similar timing. html string length is 16539, around 465 lines of XML, 77 nodes. This matches what I get in the browser. – PaulF Jan 24 '19 at 14:41
  • I've also tried this using `XmlDocument doc1 = new XmlDocument(); doc1.Load("https://www.lowrance.com/en-au/sitemap.xml");` and get an exaxt match to what is expected. Try writing out the XML document to as maybe the debugger is cutting off the string if thats how you are evaluating it? – MikeS Jan 24 '19 at 14:51
  • If I download this site in Firefox or Chrome, I get back a document with 99 nodes. Looks like the site is serving up its content dynamically based on headers, or through a cache of some kind. The code isn't the (immediate) problem. – Jeroen Mostert Jan 24 '19 at 14:51
  • so in the debugger i get 77 but when I load the browser and download the file i get 2361 lines with a length of 121,240 – Mark Moonie Griffiths Jan 24 '19 at 14:55
  • In Chrome & IE11 I get 77 nodes. – PaulF Jan 24 '19 at 15:02
  • 2
    I can now get back 99 nodes in PowerShell too, without tweaking headers, where the same code previously gave 77. The site is giving back different documents based on *something* -- phase of the moon, Cloudflare host, their own load balancing, random number, something like that. – Jeroen Mostert Jan 24 '19 at 15:02
  • @JeroenMostert this is the answer!! What I think I will do is download it locally then reload that file in to deal with the XML. thank you all for your help. – Mark Moonie Griffiths Jan 24 '19 at 15:06
  • unfortunately this did not work. I even ran it locally and it still ends at exactly the same node every time even though the XML is all valid. – Mark Moonie Griffiths Jan 25 '19 at 11:03

0 Answers0