Why do I get a corrupted *.gz file after creating it with C#?

Question

I’m developing a .NET 5, C# console application to create a CSV file from a list of custom objects, gzip it, and upload it to an Azure Storage container with this code:

var blobServiceClient = new BlobServiceClient("My connection string");
var containerClient = GetBlobContainerClient("My container name");
var config = new CsvConfiguration(CultureInfo.CurrentCulture) { Delimiter = ";", Encoding = Encoding.UTF8 };

var list = new List<FakeModel>
{
   new FakeModel { Field1 = "A", Field2 = "B" },
   new FakeModel { Field1 = "C", Field2 = "D" }
};

await using var memoryStream = new MemoryStream();
await using var streamWriter = new StreamWriter(memoryStream);
await using var csvWriter = new CsvWriter(streamWriter, config);

await csvWriter.WriteRecordsAsync(list);

await using var zip = new GZipStream(memoryStream, CompressionMode.Compress, true);
await memoryStream.CopyToAsync(zip);

memoryStream.Seek(0, SeekOrigin.Begin);
var blockBlob = containerClient.GetBlockBlobClient("test.csv.gz");
await blockBlob.UploadAsync(memoryStream);

It appears to work, but when I download the gzip from the cloud to check it, I get the following error when trying to decompress it:

Inspection of the file shows it has a length of 0.

Can you help me understand why?

The CopyToAsync method may not be completing before the Upload is complete. — jdweng, Jun 11 '21 at 10:23
If you open the .gz file in a hex editor (e.g. [Can I hex edit a file in Visual Studio?](https://stackoverflow.com/q/1724586/1115360)), does it look like a .gz file? — Andrew Morton, Jun 11 '21 at 10:24
@AndrewMorton if I open it like that, I get `00000000 ` as content. — Pine Code, Jun 11 '21 at 10:31
I guess GZipStream should be in your writer-stack between MemoryStream and StreamWriter. So, CsvWriter -> StreamWriter -> GZipStream -> MemoryStream — Fildor, Jun 11 '21 at 10:38

score 2 · Accepted Answer · answered Jun 11 '21 at 11:38

The stream parameter of a new GZipStream is the destination stream. To process the input to the output, you need to write to the instance of GZipStream somehow.

When I experimented with it, I found that a call to csvWriter.FlushAsync() was necessary.

Like this:

class Program
{
    public class FakeModel
    {
        public string Field1 { get; set; }
        public string Field2 { get; set; }
    }

    static async Task Main(string[] args)
    {
        var config = new CsvConfiguration(CultureInfo.CurrentCulture) { Delimiter = ";", Encoding = Encoding.UTF8 };

        var list = new List<FakeModel>
                        {
                           new FakeModel { Field1 = "A", Field2 = "B" },
                           new FakeModel { Field1 = "C", Field2 = "D" }
                        };

        await using var memoryStream = new MemoryStream();
        await using var streamWriter = new StreamWriter(memoryStream);
        await using var csvWriter = new CsvWriter(streamWriter, config);

        await csvWriter.WriteRecordsAsync(list);
        await csvWriter.FlushAsync();

        memoryStream.Position = 0;

        await using var fs = new FileStream(@"C:\temp\SO67935249.csv.gz", FileMode.Create);
        await using var zip = new GZipStream(fs, CompressionMode.Compress, true);
        await memoryStream.CopyToAsync(zip);

    }
}

I think the example could be simplified by writing to gzipStream directly, i.e. `new Streamwriter(new GzipStream(...))`. Also, `using var ...` will do the cleanup when the method ends, in this case it a scoped using statement might be better, I think that should avoid the need for explicit Flushing. — JonasH, Jun 11 '21 at 11:45

Why do I get a corrupted *.gz file after creating it with C#?

1 Answers1