1

I'm trying to get the string by webclient and it has japanese characters but it shows these kind of characters ,�^�p�Ǘ�.

var url= "http://www.itmedia.co.jp/im/articles/0609/14/news117.html";

using (var w = new WebClient())
{
   w.Encoding = Encoding.UTF8;
   var htmlData= w.DownloadString(url);
}

The value of json_data doesn't show Japanese Characters.

Can you enlighten me why it doesn't convert to Japanese characters even if I encode it to UTF-8?

Dean
  • 23
  • 7
  • 1
    Download in Firefox and do "Tools" -> "Page Info" and you will see the encoding is actually Shift_JIS. See [this answer](http://stackoverflow.com/a/30049848/3744182) for how to make `WebClient` detect the encoding automatically. – dbc Dec 14 '16 at 05:55
  • @dbc ^ that is totally the best answer. thanks man. – Dean Dec 15 '16 at 03:22

3 Answers3

1

According to 3rd line of view-source, it's encoded in shift-jis:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="ja" id="masterChannel-enterprise"><head>
<meta http-equiv="content-type" content="text/html;charset=shift_jis">
John Machin
  • 81,303
  • 11
  • 141
  • 189
  • But you can't get to that line if you don't have the encoding right. Try to imagine what you would do if you were the browser. The code is UTF-8 encoded. – Patrick Hofman Dec 14 '16 at 06:27
0

If you open the page with Postman, you can see the headers of the response.

Postman

As you can see in the picture, the response is compressed with gzip. That is probably causing the scrambled response you see.

WebClientnowadays supports decompressing gzip automatically, but it wasn't that way always. (If I run your code on .NET 4.6.2 on Windows 10, I do get the right results) It might be you are targeting an older version of the .NET Framework that doesn't support gzip decompression out of the box. The linked post should solve that.

Community
  • 1
  • 1
Patrick Hofman
  • 153,850
  • 22
  • 249
  • 325
0

I changed the code from UTF-8 to shift_jis.

w.Encoding = Encoding.GetEncoding("shift_jis");
Dean
  • 23
  • 7