0

I'm trying to get html source of a webpage by using WebRequest Class. It works completely fine in most cases but with this specific website ,however, the resulted string is totally meaningless.

static void Func()
    {

        WebRequest wr = WebRequest.Create(url);
        wr.ContentType = "text/html";
        wr.UseDefaultCredentials = true;
        WebResponse response = wr.GetResponse();

        string s;
        using (StreamReader sr = new StreamReader(response.GetResponseStream(), Encoding.UTF8, true))
        {
            s = sr.ReadToEnd();
        }
        using (StreamWriter sw = File.CreateText(@"./test.txt"))
        {
            string[] lines = s.Split(Environment.NewLine);
            foreach (var line in lines)
            {
                sw.WriteLine(line);
            }
        }

        response.Close();
    }

and the result is the test.txt file containing text lines like :

"붿嵪뿯庽붿뿯岽獛윛纕問뿯쎽྘怞ȍ붿뿯붿뿯Ⓗ뿯붿뿯붿붿뿯疽붿붿〡붿ㄅ〴뿯붿붿뿯撽뿯撽뿯붿붿뿯붿퉚";

I've already used different encoders and yet nothing worked. This is the website link : "http://www.tsetmc.com/loader.aspx?ParTree=151311&i=33854964748757477"

Parsa97
  • 1
  • 1
  • You haven't given the website, but from experience, this is probably a site which is ignoring your `Accept` header, and is giving you a gzip-encoded response (Amazon does this, for example). You can see this from the response `Content-Encoding` header. There are ways to make HttpClient/WebRequest automatically decompress gzip-encoded responses. – canton7 Apr 25 '20 at 11:38
  • 1
    Looks like you are getting unicode. The webpage may default to unicode and you need to add a header to the request to get a different response. You may be to add a language to the header. – jdweng Apr 25 '20 at 11:39
  • I check that website, and it returns a gzip-encoded response regardless of what `Accept-Encoding` header you send it, as I suspected. See the duplicate. – canton7 Apr 25 '20 at 12:04

0 Answers0