0

I have tried many methods to deserialise this xml from URL. But none were successful due to what I believe is an encoding issue.

If i right click download, then deserialise it from my C drive, it works fine.

So i decided to try downloading the file first, and then process it. But the file it downloads via code is in the wrong encoding as well!

I dont know where to start, but im thinking maybe forcing a UTF-8 or UTF-16 encoding when downloading??

Here is the download code:

using (var client = new WebClient())
{
    client.DownloadFile("http://example.com/my.xml", "my.xml");
}

How to download a file from a URL in C#?

Image of file when downloaded enter image description here

Community
  • 1
  • 1
Mcloving
  • 1,390
  • 1
  • 13
  • 30
  • have you tried another editor than notepad? – Gusman Apr 28 '16 at 13:22
  • @Gusman I just tried Notepad ++. Looks different but just as crazy – Mcloving Apr 28 '16 at 13:25
  • @CharlesMager Unregonised header? Unsure. setting encoding UTF8 as below seemed to have worked. (its had me baffled for hours - tried 10 or so different methods [Streams/reads]) – Mcloving Apr 28 '16 at 13:29
  • 1
    It's something odd! Sorry, deleted my comment after I saw you'd fixed it. It's very odd that `feed.xml` would appear in an xml file, and given UTF8 is a superset of ASCII you'd expect to see most of the content if it was attempting to decode as that. – Charles Mager Apr 28 '16 at 13:31
  • @CharlesMager Something worked, but it no longer works when i re run :| – Mcloving Apr 28 '16 at 13:31
  • 1
    It could be compressed... is the URL publicly accessible, can you share it? – Charles Mager Apr 28 '16 at 13:33
  • @CharlesMager Unfortunately not, however there is supposed to be a gz extention to the api for compression. Doubt it would return me that though as its a different url? url/file.xml vs url/file.gz – Mcloving Apr 28 '16 at 13:35
  • Try adding an accept header to say you want xml - `client.Headers.Add("Accept","application/xml")` – Charles Mager Apr 28 '16 at 13:48
  • @CharlesMager Ill Try that two seconds. Enspired by your comments, I just put .gz extention on my downloaded file.. then extracted it. and ... There is my xml file :) Post an answer of sorts. It definately helped me. The file internal to the gz is why there is a name in the text – Mcloving Apr 28 '16 at 13:49
  • @CharlesMager Adding that hasnt helped, however I can look at extracting it first. – Mcloving Apr 28 '16 at 13:52

2 Answers2

3

Try this

using (var client = new WebClient())
{
    client.Encoding = System.Text.Encoding.UTF8;
    client.DownloadFile("http://example.com/my.xml", "my.xml");
}
JOSEFtw
  • 9,781
  • 9
  • 49
  • 67
  • just tried `Encoding.UTF8` and `Encoding.ASCII` to seperate files. `UTF8` as you suggested fixed it! :) And `ASCII` looks like the garbage i was getting. So i assume that is the default. – Mcloving Apr 28 '16 at 13:28
  • Actually i take that back, something worked, but when i run again it isnt – Mcloving Apr 28 '16 at 13:31
0

The file was infact in a gzip format. Despite it being an xml url.

My connections must have been accepting gzip so the server responded with such. Even though i tried a few different methods with different variations. (Downloading/String streaming, parsing string from URL etc)

The solution for me, was to download, then uncompress the gzip file before deserialising. Telling the server not to send gzip didn't work. But may be a possibility for some.

Mcloving
  • 1,390
  • 1
  • 13
  • 30