1

I use below code for get page Source, but it doesn't return property data :

string url = "http://www.tsetmc.com/Loader.aspx?ParTree=15131F";
            WebClient client = new WebClient();
            client.Headers["Accept-Encoding"] = "gzip";
            string pageSource = client.DownloadString(url);

The Content-Encoding of website is gzip

image

SajjadZare
  • 2,487
  • 4
  • 38
  • 68
  • That page is dynamically generated by scripts. You cannot get its constructed results downloading a stream, you need a WebBrowser to interpret and execute the scripts. So, navigate to the page with a headless browser and get the rendered page. – Jimi Jan 06 '20 at 11:23
  • @Jimi I did it with CefSharp but i need to do it in Class Library (need to write windows service to do it regularly) not graphically form (like windows form) and because of that i tried to do it with WebClient – SajjadZare Jan 06 '20 at 12:05
  • 1
    You don't need the graphical interface. That's why a wrote *headless browser*. You need a WebBrowser class, not the UI control. Try with the standard WebBrowser class and handle the DocumentCompleted event. Beware of IFrames. Read the notes here [How to get an HtmlElement value inside Frames/IFrames?](https://stackoverflow.com/a/53218064/7444103) – Jimi Jan 06 '20 at 12:42
  • This may also come in handy: [WebBrowser Control in a new thread](https://stackoverflow.com/a/4271581/7444103) (needs to be handled **carefully**). This other implementation, too: [Close Application after WebBrowser print](https://stackoverflow.com/a/57349052/7444103) (a more specific use case, it also shows the use of the native ActiveX) – Jimi Jan 06 '20 at 12:49
  • Also apply these modifications to the WB compatibility mode, in case you haven't: [How can I get the WebBrowser control to show modern contents?](https://stackoverflow.com/a/38514446/7444103) – Jimi Jan 06 '20 at 12:56

1 Answers1

2

By setting client.Headers["Accept-Encoding"] = "gzip"; you are asking the server to send a compressed response. However, you are not decompressing it. This is causing the incorrect response.

As per https://stackoverflow.com/a/4914874/23633, you can get WebClient to automatically decompress responses by modifying the HttpWebRequest it creates:

class MyWebClient : WebClient
{
    protected override WebRequest GetWebRequest(Uri address)
    {
        var request = (HttpWebRequest) base.GetWebRequest(address);
        request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
        return request;
    }
}

string url = "http://www.tsetmc.com/Loader.aspx?ParTree=15131F";
WebClient client = new MyWebClient();
// don't set the Accept-Encoding header here; it will be done automatically
string pageSource = client.DownloadString(url);
Bradley Grainger
  • 27,458
  • 4
  • 91
  • 108