2

When I open this excel file link in my browser, It will be downloaded successfully. But when I download it by the following c# code

private void downloadFile()
    {
        string remoteUri = "http://members.tsetmc.com/tsev2/excel/MarketWatchPlus.aspx?d=0";
        string fileName = @"g:\temp.xlsx";

        using (var client = new WebClient())
        {
            client.DownloadFile(remoteUri, fileName);
        }
    }

and I open it in the file explorer, I get the file format error:

enter image description here

What is wrong with my code?

Bob
  • 22,810
  • 38
  • 143
  • 225

2 Answers2

2

Unzip file and write.

        string remoteUri = "http://members.tsetmc.com/tsev2/excel/MarketWatchPlus.aspx?d=0";
        string fileName = @"g:\temp.xlsx";

        using (var client = new WebClient())
        {
            using var stream = client.OpenRead(remoteUri);
            using var zipStream = new GZipStream(stream, CompressionMode.Decompress);
            using var resultStream = new MemoryStream();
            zipStream.CopyTo(resultStream);
            File.WriteAllBytes(fileName, resultStream.ToArray());

        }
Mustafa Arslan
  • 774
  • 5
  • 13
  • How do you know that the file is compressed and needs manual decompression? Why doesn't WebClient do the decompression automatically? The compression part should be a silent part of the download, the client shouldn't say that it accepts compressed contents if it doesn't and the server shouldn't respond with compressed content if the client doesn't say that it accepts it. So again, how do you know the file is compressed? – Lasse V. Karlsen Sep 09 '20 at 18:07
  • @LasseV.Karlsen This is a problem specific solution. Default behavior of `WebClient` is non autodecompress. If we have general purpose downloader, we can inherit `WebClient` and override `GetWebRequest` method like this https://stackoverflow.com/questions/2973208/automatically-decompress-gzip-response-via-webclient-downloaddata – Mustafa Arslan Sep 09 '20 at 18:17
  • But does the WebClient say that it *accepts* compressed content? – Lasse V. Karlsen Sep 09 '20 at 18:44
  • @LasseV.Karlsen It is possible to do the decompression silently. It all depends on how your client is setup. For example in my answer, i have setup the httpclient to decompress content automatically via handler. With that in place, the httpclient can automatically decompress the data and provide it to you. No need to worry about whether the response was compressed or not by the server. – Durga Prasad Sep 09 '20 at 18:45
  • My point is that serving compressed content is a contract between the client and the server. If the client says that it can accept compressed encoding, and the server supports it, it can serve compressed content, saying that it is compressed. If the WebClient class doesn't decompress silently, why does it say that it accepts compressed content? And if it doesn't, why the server serve compressed content? Either way, there are two answers here just saying "The server is serving compressed content" and my question is still how you definitely know that. – Lasse V. Karlsen Sep 09 '20 at 18:46
  • Not as in "If you inspect the headers you can learn that the server served compressed content", I'm asking how **you**, the two of you that answered this, knows that **this** server serves compressed content? In this particular case? – Lasse V. Karlsen Sep 09 '20 at 18:47
  • @LasseV.Karlsen I did not knew it until i inspected the response headers. I took the sample code and ran it inside of a console application to figure out the actual response that was provided by the server. Otherwise the only other option would have been to go through the documentation of api which the OP has not provided. – Durga Prasad Sep 09 '20 at 18:50
  • 1
    Then shouldn't that part of the mystery be part of the answer? In the future, if someone comes here wondering why WebClient is producing corrupt files, how will they know whether compression or not is the answer in their case? – Lasse V. Karlsen Sep 09 '20 at 18:51
  • I still contend that the server in this case is in error. WebClient is not going to state that it accepts compressed content if it does not deal with decompression, so if the server is serving compressed content it is in breach of contract. It should serve it uncompressed. But that is completely besides the point. – Lasse V. Karlsen Sep 09 '20 at 18:54
1

If you look at the response headers provided by the remoteUri, you will notice that the particular endpoint is actually serving content in compressed format.

Content-Encoding: gzip

Response Headers Snap

So the content you get back is not a direct excel file, rather a zip file. So for the piece of code to work, the file name should be temp.zip instead of temp.xlsx

private void downloadFile()
{
    string remoteUri = "http://members.tsetmc.com/tsev2/excel/MarketWatchPlus.aspx?d=0";
    string fileName = @"g:\temp.zip";

    using (var client = new WebClient())
    {
        client.DownloadFile(remoteUri, fileName);
    }
}

Having said that, inline is a better approach to download the file. Create an instance of HttpClient by passing in a HttpClientHandler which has the AutomaticDecompression property set to DecompressionMethods.GZip to handle Gzip decompression automatically. Next read the data and save it to temp.xlsx file.

string remoteUri = "http://members.tsetmc.com/tsev2/excel/MarketWatchPlus.aspx?d=0";
string fileName = @"g:\temp.xlsx";
HttpClientHandler handler = new HttpClientHandler()
{
    AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate
};

HttpClient client = new HttpClient(handler);
var response = await client.GetAsync(remoteUri);

var fileContent = await response.Content.ReadAsByteArrayAsync();
File.WriteAllBytes(fileName, fileContent);
Durga Prasad
  • 939
  • 10
  • 20
  • Thanks for you response. But the main question is when you click on the link the browser downloads and saves an excel file. what is the difference between my code and browser behavior? – Bob Sep 09 '20 at 17:56
  • 2
    Browser does a lot more that just downloading the content. It looks at the responses headers and automatically does the decompression for you. – Durga Prasad Sep 09 '20 at 18:00
  • @VDWWD there is a subtle difference, temp.zip instead of temp.xlsx. which makes all the difference. – Durga Prasad Sep 09 '20 at 18:05
  • I did not see that. Sorry. – VDWWD Sep 09 '20 at 18:05
  • How do you know it is compressed? – Lasse V. Karlsen Sep 09 '20 at 18:15
  • We can look at the headers of the response for that information. To be more specific, the 'ContentEncoding' header in response had a value of gzip for the request asked in question. – Durga Prasad Sep 09 '20 at 18:27