0

I am having problems in retrieving the contents of a http get request in the proper charset.

I tried several pieces of code, such as the following:

HttpClient h = new HttpClient();
//Content-Type: text/html; charset=UTF-8

//p.s. contents are in hebrew.    
var resp = h.GetAsync("http://www.wnf.co.il");
var content = resp.Result.Content;

//remove the default Content-Type header
content.Headers.Remove("Content-Type");
content.Headers.Add("Content-Type", "text/html; charset=utf-8");
var res = content.ReadAsStringAsync();
var s = res.Result;

Console.WriteLine(s);

which still does not help, I still get the content in wrong encoding.

enter image description here

This post clarifies that setting the header's request headers charset will not help, it's the response's one that needs to be set. (Besides, you will get an error in trying to add header "Content-Type" to a request Header.)

But I still could not end up with working retrieval of the content in the proper charset (utf-8).

What am I missing ?

I have been doing similar stuff with hebrew sites for a while, in comparing the response's header in Fiddler from this site and others where I do not have this problem - the only difference I see is indeed this Content-Type header in the response.

Community
  • 1
  • 1
Veverke
  • 9,208
  • 4
  • 51
  • 95
  • So what is the actual encoding of the content, and what does the Content-Type header tell you? When those differ, most HTTP clients return mojibake, you might have to get the response body as byte array and decode the string from the bytes yourself. – CodeCaster Feb 18 '16 at 11:15
  • fiddler's response ContentType for this site (showing a good-looking response) is UTF-8 with media type "text/html" like I am trying to set. – Veverke Feb 18 '16 at 11:19

1 Answers1

1

The issue is probably due to this bug:

https://connect.microsoft.com/VisualStudio/feedback/details/790174/system-net-http-httpcontent-readasstringasync-does-not-handle-imperfect-content-type-headers

The work-around is to get the response as a byte array and encode it yourself:

var bytes = await content.ReadAsByteArrayAsync();
var s = Encoding.UTF8.GetString(bytes, 0, bytes.Length);

As a side-note, is there a reason you're using .Result instead of await? You are blocking the current thread unnecessarily and setting yourself up for deadlocks.

Todd Menier
  • 37,557
  • 17
  • 150
  • 173
  • Hey Todd, starting from the end, the main reason is the lack of knowledge on the topic from my part :) Thanks for pointing that out, will look at it. (Honestly I do it this way because *somewhere/someone* said `Result` is the same as issuing `await`, and using await has the *annoyance* of requiring the method to be marked with sync, which adds more requirements...). About the answer to my problem, will try what you suggest and will let you know what happened. – Veverke Feb 21 '16 at 09:03
  • Todd, according to your answer, if I would replace in my code the line that adds the header to heading a *non-imperfect* Content Type header, it should work, should it not ? But it does not. – Veverke Feb 21 '16 at 09:06
  • Using your solution (still with Result instead of awaiting) also does not work, the debugger still shows bad encoded characters. You get the +1 anyway for the overall insights :) – Veverke Feb 21 '16 at 09:08
  • Todd, not much related to this, but I posted a [question](http://stackoverflow.com/q/35575734/1219280) on the async/await topic you triggered. Are you able to answer it ? – Veverke Feb 23 '16 at 11:31
  • Looks like you got the help you need on that one? I'd highly recommend checking out [Stephen Cleary's introductory blog series](http://blog.stephencleary.com/2012/02/async-and-await.html) on async/await, it seems you have some very serious misconceptions about how it works, and that's beyond the scope of this question. – Todd Menier Feb 23 '16 at 14:17
  • I came across this material time ago, when I wanted to learn it. A long time passed without me using async stuff, so seems I ended up not learning it :). Hope this changes soon. Thanks. – Veverke Feb 23 '16 at 15:22