2

I would like to open a web site and read source of that. so I wrote this code:

WebClient client = new WebClient();
htmlCode = client.DownloadString("http://www.varzesh3.com");

but I got a garbage data. I also add this codes but still it does not work.

client.Encoding = Encoding.UTF8; client.Headers.Add("charset", "utf-8");

In addition, I use this codes but none of them did not work:

byte[] raw = client.DownloadData("http://www.varzesh3.com");

string webData1 = Encoding.ASCII.GetString(raw);
string webData2 = Encoding.BigEndianUnicode.GetString(raw);
string webData3 = Encoding.Unicode.GetString(raw);
string webData4 = Encoding.UTF32.GetString(raw);
string webData5 = Encoding.UTF7.GetString(raw);
string webData6 = Encoding.UTF8.GetString(raw);

note: I can open and read any other website which uses persian(farsi) language but I could not open www.varzesh3.com could you please help me ?

  • My guess is that that web site is misconfigured so that its headers don't match its content... – Jon Skeet Jan 02 '16 at 19:49
  • thank you for guiding me, but how can I fix this problem ? – user3703112 Jan 02 '16 at 19:51
  • Well you can download the raw bytes instead, and try to work out what encoding you *should* use... – Jon Skeet Jan 02 '16 at 19:52
  • i use below code but any of encoding does not work :( : byte[] raw = client.DownloadData("http://www.varzesh3.com"); string webData1 = Encoding.ASCII.GetString(raw); string webData2 = Encoding.BigEndianUnicode.GetString(raw); string webData3 = Encoding.Unicode.GetString(raw); string webData4 = Encoding.UTF32.GetString(raw); string webData5 = Encoding.UTF7.GetString(raw); string webData6 = Encoding.UTF8.GetString(raw); – user3703112 Jan 02 '16 at 20:02
  • 1
    Please update our question rather than just adding comments. – Jon Skeet Jan 02 '16 at 20:02

2 Answers2

3

The result of that site is compressed. You need to decompress it first. More info here. Now by using the custom MyWebClient, you will have:

using (var client = new MyWebClient { Encoding = Encoding.UTF8 })
{
    var test = client.DownloadString("http://www.varzesh3.com/");
}
Community
  • 1
  • 1
VahidN
  • 18,457
  • 8
  • 73
  • 117
1

It is because the website uses gzip to compress the output. You should decompress it

using (var hc = new HttpClient())
using (var stream = await hc.GetStreamAsync(@"http://www.varzesh3.com/"))
using (var gzstream = new GZipStream(stream, CompressionMode.Decompress))
using (var reader = new StreamReader(gzstream))
{
    var text = await reader.ReadToEndAsync();
    // do what you want with text
}
Hamid Pourjam
  • 20,441
  • 9
  • 58
  • 74