3

I'm trying to use WebClient class to view the content of a hebrew page, but get gibberish instead of Hebrew.

My code is:

using (WebClient webClient = new WebClient())
{
    webClient.Headers.Add(HttpRequestHeader.ContentType, "charset=windows-1255");
    string page = webClient.DownloadString("http://hebrew-academy.huji.ac.il/Pages/default.aspx");
}

I'm receiving the English content correctly, but the Hebrew content is Gibberish for example:

<title> ׳”׳׳§׳“׳׳™׳” ׳׳׳©׳•׳ ׳”׳¢׳‘׳¨׳™׳× ג€“ ׳“׳£ ׳”׳‘׳™׳×</title>

Does anyone knows how to get the hebrew content correctly?

Matt Ball
  • 354,903
  • 100
  • 647
  • 710
Idan P
  • 265
  • 3
  • 8
  • possible duplicate of [ASP.NET / C# WebClient.DownloadString() returns string with perculiar characters](http://stackoverflow.com/questions/4716470/asp-net-c-sharp-webclient-downloadstring-returns-string-with-perculiar-chara) – Matt Ball Sep 05 '13 at 16:35

1 Answers1

8

That page is transmitted as UTF-8, so you should be interpreting it as UTF-8, not as Windows-1255. Do this by setting WebClient.Encoding to System.Text.Encoding.UTF8.

Community
  • 1
  • 1
Matt Ball
  • 354,903
  • 100
  • 647
  • 710
  • Same result when I use: webClient.Headers.Add(HttpRequestHeader.ContentType, "utf-8"); – Idan P Sep 05 '13 at 15:47
  • [`Content-Type` when used as a request header only specifies the content type of data being sent in the request, for POST and PUT.](http://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Requests) See my edit for the fix. – Matt Ball Sep 05 '13 at 16:32