Zipping a stream on the fly without using the disk

Question

I'm trying to write a method in my C# MVC project that streams a file from S3 (or anywhere) and compresses it into a zip file on-the-fly before sending the compressed stream to the user. So far I've found several ways to create a zip file from a stream by saving it to disk and then returning it normally, but I'd like to skip the saving to disk and use a buffer to stream approach. I'm trying to download a very large file (4gb+) that is easily compressed to a fraction of its original size.

So far I have this which avoids the disk, but seems to load the entire file into memory first:

using( var memoryStream = new MemoryStream() )
{
    using( var archive = new ZipArchive( memoryStream, ZipArchiveMode.Create, true ) )
    {
        var zipEntry = archive.CreateEntry( File );

        using( var entryStream = zipEntry.Open() )
        {
            S3.StreamFile( File, Bucket ).CopyTo( entryStream );
        }
    }

    return base.File( memoryStream.ToArray(), "application/zip", File + ".zip" );
}

A similar question (Creating a ZIP Archive in Memory Using System.IO.Compression) only has answers that involve writing to disk.

Possible duplicate of [Creating a ZIP Archive in Memory Using System.IO.Compression](https://stackoverflow.com/questions/17232414/creating-a-zip-archive-in-memory-using-system-io-compression) — Dialecticus, Sep 17 '18 at 21:55
*but seems to load the entire file into memory first* - You mean `S3.StreamFile`? What's your evidence for that? [`Stream.CopyTo()`](https://referencesource.microsoft.com/#mscorlib/system/io/stream.cs,295ec16c77d4fafb) uses a copy buffer size of `81920`. Can you share a [mcve]? — dbc, Sep 17 '18 at 22:00
You need to create a wrapper (Facade pattern) around a Stream object, that will track the stream position. Then you will not need any MemoryStream. Also you need to write to the output stream directly en return and EmptyResult — Kalten, Sep 17 '18 at 22:03

score 7 · Accepted Answer · answered Sep 17 '18 at 22:22

The ZipArchive class require a stream that provide the current position. TrackablePositionStream class below save the position by incrementing a field when a write call occurs

public class TrackablePositionStream : Stream
{
    private readonly Stream _stream;

    private long _position = 0;

    public TrackablePositionStream(Stream stream)
    {
        this._stream = stream;
    }

    public override void Flush()
    {
        this._stream.Flush();
    }

    public override long Seek(long offset, SeekOrigin origin)
    {
        throw new NotImplementedException();
    }

    public override void SetLength(long value)
    {
        throw new NotImplementedException();
    }

    public override int Read(byte[] buffer, int offset, int count)
    {
        throw new NotImplementedException();
    }

    public override void Write(byte[] buffer, int offset, int count)
    {
        this._position += count;
        this._stream.Write(buffer, offset, count);
    }

    public override bool CanRead => this._stream.CanRead;

    public override bool CanSeek => this._stream.CanSeek;

    public override bool CanWrite => this._stream.CanWrite;

    public override long Length => this._stream.Length;

    public override long Position
    {
        get
        {
            return this._position;
        }
        set
        {
            throw new NotImplementedException();
        }
    }
}

Then use it in your action method :

using( var archive = new ZipArchive(new TrackablePositionStream(response.OutputStream), ZipArchiveMode.Create, true ) )
{
    var zipEntry = archive.CreateEntry( File );

    using(var entryStream = zipEntry.Open() )
    {
        S3.StreamFile( File, Bucket ).CopyTo( entryStream );
    }
}

return new EmptyResult();

Great solution without using external libraries!! Works very well. — Valerio Natangelo, Oct 26 '20 at 09:39
Why are we pulling `CanRead` and `CanSeek` from the underlying stream, when we know that we never support either of those? Shouldn't we return a constant `false` instead? — Lily Finley, May 18 '23 at 16:39
Because this is what is used by ZipArchive to know if it can use Position property or not. — Kalten, May 19 '23 at 15:30

Zipping a stream on the fly without using the disk

1 Answers1