0

I am trying to implement download a zip file and unzip it with progressbar. Roughly below how my code looks like

var handler = new HttpClientHandler() { AllowAutoRedirect = true };
var ph = new ProgressMessageHandler(handler);
ph.HttpReceiveProgress += (_, args) => { GetProgress(args.ProgressPercentage); };
var httpClient = new HttpClient(ph);
var response = await _httpClient.GetAsync(uri, HttpCompletionOption.ResponseHeadersRead, cancellationToken.Token);
response.EnsureSuccessStatusCode();
using (var zipInputStream = new ZipInputStream(response.Content.ReadAsStreamAsync()))
{
    while (zipInputStream.GetNextEntry() is { } zipEntry)
    {
        var entryFileName = zipEntry.Name;
        var buffer = new byte[4096];
        var directoryName = Path.GetDirectoryName(fullZipToPath);
        if (directoryName?.Length > 0)
        {
            Directory.CreateDirectory(directoryName);
        }
        if (Path.GetFileName(fullZipToPath).Length == 0)
        {
            continue;
        }
        using (var streamWriter = File.Create(fullZipToPath))
        {
            StreamUtils.Copy(zipInputStream, streamWriter, buffer);
        }
    }
}

My problem here is when I use ResponseHeadersRead instead of ResponseContentRead, ProgressMessageHandler is not reporting progress, using ResponseContentRead I can see the progress incrementing correctly.

It also works fine using ResponseHeadersRead and copy the stream directly to a file as below.

await using (var fs = new FileStream(pathToNewFile + "/test.zip", FileMode.Create))
{
    await response.Content.CopyToAsync(fs);
}

But I feel like this way is waste to download zip to a temp file and unzip again with another stream while i can directly pass the stream to ZipInputStream like I do above. I believe I do something wrong here as I possible misunderstand the usage of ZipInputStream or ResponseHeadersRead? Does ZipInputStream require entire stream loaded at once while ResponseHeadersRead can gradually download the stream, so at the end I cannot directly pass the stream like that?

Please give me a suggestion if that is bad usage or i miss something?

EDIT: Problem seems to be because StreamUtils.Copy is sync, and Progress is only reported when this line is executed completed but it is already 100% once it is done. It looks like that ZipInputStream doesn't provide any async option to copy stream into a file. I need to probably find an alternative.

EDIT 2: I have changed the code using the built in ZipArchive, but also implements as Sync

  using (var zipArchive = new ZipArchive(fileStream, ZipArchiveMode.Read))
   {                  
      zipArchive.ExtractToDirectory(directoryName, true)
   }

EDIT 3 Working solution: like I said if I just copy the response first to filestream and write as zip file

await using (var fs = new FileStream(pathToNewFile + "/test.zip", FileMode.Create))
{
    await response.Content.CopyToAsync(fs);
}

then read this zip file into stream and use this stream as below. it works, I can see the progress.

 var fileToDecompress = new FileInfo(_pathToNewFile + $"/test.zip");
 var stream = fileToDecompress.OpenRead();

     using (var zipArchive = new ZipArchive(fileStream, ZipArchiveMode.Read))
       {                  
          zipArchive.ExtractToDirectory(directoryName, true)
       }
Charlieface
  • 52,284
  • 6
  • 19
  • 43
Emil
  • 6,411
  • 7
  • 62
  • 112
  • Maybe use this answer https://stackoverflow.com/a/46497896/14868997 – Charlieface Jul 10 '23 at 16:02
  • Which `ProgressMessageHandler` are you using? .NET Core has no such class [and no plans to add one](https://github.com/dotnet/runtime/issues/16681). Googling only shows classes from ASP.NET – Panagiotis Kanavos Jul 10 '23 at 16:20
  • The correct way to do this would be to use `IProgress` and publish a progress message every time things moved forward. If you read data from a stream in a loop, you can report progress in that loop. The code you posted also tries to compress the downloaded data on the fly, which is another time consuming operation that requires progress reporting – Panagiotis Kanavos Jul 10 '23 at 16:28
  • @PanagiotisKanavos what do you mean by compress? I am trying to unzip. I thought that Data is first downloaded and unzipped via stream. Do you mean that writing to temp Zip file and reading it again with another stream is better? Anyhow I have changed my code using builtin ZipArchive like in my edits. Do you mean that using like on Edit 3 is better option? – Emil Jul 10 '23 at 17:07
  • HTTP response compression doesn't use ZIP. ZIP isn't a stream compression format. The standard compression algorithms are deflate, GZIP, Brotli and a couple more. HttpClient handles this already. – Panagiotis Kanavos Jul 10 '23 at 17:12

1 Answers1

1

As you have found, the UI will not update if the copying is done synchronously.

Unfortunately, there is no async version of ExtractToDirectory as yet. Ther is an open GitHub issue for this.

In the meantime, you can use the following code. Most of it is taken from the original source code:

public static async ValueTask ExtractToDirectoryAsync(
  this ZipArchive source,
  string destinationDirectoryName,
  bool overwriteFiles,
  CancellationToken cancellationToken = default
)
{
    var extractPath = Path.GetFullPath(destinationDirectoryName);

    // Ensures that the last character on the extraction path is the directory separator char.
    // Without this, a malicious zip file could try to traverse outside of the expected extraction path.
    if (!extractPath.AsSpan().EndsWith(new ReadOnlySpan<char>(in Path.DirectorySeparatorChar), StringComparison.Ordinal))
        extractPath += Path.DirectorySeparatorChar;

    Directory.CreateDirectory(extractPath);

    foreach (var entry in source.Entries)
    {
        // Gets the full path to ensure that relative segments are removed.
        var destinationPath = Path.GetFullPath(Path.Combine(extractPath, entry.FullName));

        if (!destinationPath.StartsWith(extractPath, StringComparison.Ordinal))
            throw new IOException($"Entry {extractPath} has path outside {destinationDirectoryName}");

        if (Path.GetFileName(destinationPath).Length == 0)
        {
            // If it is a directory:
            if (entry.Length != 0)
                throw new IOException("Entry is directory with data");

            Directory.CreateDirectory(destinationPath);
        }
        else
        {
            await entry.ExtractToFileAsync(destinationPath, overwriteFiles, cancellationToken);
        }
    }
}
public static async ValueTask ExtractToFileAsync(
  this ZipArchiveEntry source,
  string destinationFileName,
  bool overwrite,
  CancellationToken cancellationToken = default
)
{
    FileStreamOptions fileStreamOptions = new()
    {
        Access = FileAccess.Write,
        Mode = overwrite ? FileMode.Create : FileMode.CreateNew,
        Share = FileShare.None,
        BufferSize = 0x1000,
    };

    const UnixFileMode OwnershipPermissions =
        UnixFileMode.UserRead | UnixFileMode.UserWrite | UnixFileMode.UserExecute |
        UnixFileMode.GroupRead | UnixFileMode.GroupWrite | UnixFileMode.GroupExecute |
        UnixFileMode.OtherRead | UnixFileMode.OtherWrite |  UnixFileMode.OtherExecute;

    // Restore Unix permissions.
    // For security, limit to ownership permissions, and respect umask (through UnixCreateMode).
    // We don't apply UnixFileMode.None because .zip files created on Windows and .zip files created
    // with previous versions of .NET don't include permissions.
    var mode = (UnixFileMode)(source.ExternalAttributes >> 16) & OwnershipPermissions;
    if (mode != UnixFileMode.None && !OperatingSystem.IsWindows())
    {
        fileStreamOptions.UnixCreateMode = mode;
    }

    await using (var fs = new FileStream(destinationFileName, fileStreamOptions))
    await using (var es = source.Open())
    {
        await es.CopyToAsync(fs, cancellationToken);
    }
    File.SetLastWriteTime(destinationFileName, source.LastWriteTime.DateTime);
}

Note that if the base stream is not seekable then ZipArchive will synchronously buffer it into a MemoryStream. To avoid that, you can buffer it yourself

var mem = new MemoryStream();
await yourStream.CopyToAsync(mem, someCancellationToken);
await using var zip = new ZipArchive(mem);
await zip.ExtractToDirectoryAsync(......
Charlieface
  • 52,284
  • 6
  • 19
  • 43