0

Azure doesn't allow me to unzip a file directly in a container. I downloaded a zip file and now need to expand the file in the zip file. What I get is a 0 byte file. I can download the zip to my local computer and see the embedded csv, so the zip file isn't corrupt. I get no errors, just a zero byte output file. What am I doing wrong? I've tried all of these options unsuccessfully:

using (MemoryStream ms = new MemoryStream())
{
    await zipOutputBlob.DownloadToStreamAsync(ms);
    using (var zipStream = new GZipStream(ms, CompressionMode.Decompress))
    {
        CloudBlockBlob unzippedBlob = container.GetBlockBlobReference(String.Format("{0:yyyy-MM-dd}", lastWriteTime) + " " + Path.GetFileNameWithoutExtension(fileName) +  ".csv");
        unzippedBlob.Properties.ContentType = "text/csv";
        using (Stream outputFileStream = await unzippedBlob.OpenWriteAsync())
        {
            await zipStream.CopyToAsync(outputFileStream);
            outputFileStream.Flush();
        }
    }
}

2nd try:

using (MemoryStream ms = new MemoryStream())
{
    await zipOutputBlob.DownloadToStreamAsync(ms);
    using (var zipStream = new GZipStream(ms, CompressionMode.Decompress))
    {
        CloudBlockBlob unzippedBlob = container.GetBlockBlobReference(String.Format("{0:yyyy-MM-dd}", lastWriteTime) + " " + Path.GetFileNameWithoutExtension(fileName) + ".csv");
        unzippedBlob.Properties.ContentType = "text/csv";
        await unzippedBlob.UploadFromStreamAsync(zipStream);
    }
}

3rd

using (MemoryStream ms = new MemoryStream())
{
    await zipOutputBlob.DownloadToStreamAsync(ms);
    using (var zipStream = new GZipStream(ms, CompressionMode.Decompress))
    {
        CloudBlockBlob unzippedBlob = container.GetBlockBlobReference(String.Format("{0:yyyy-MM-dd}", lastWriteTime) + " " + Path.GetFileNameWithoutExtension(fileName) + ".csv");
        unzippedBlob.Properties.ContentType = "text/csv";
        using (Stream outputFileStream = await unzippedBlob.OpenWriteAsync())
        {
            await zipStream.CopyToAsync(outputFileStream);
            outputFileStream.Flush();
        }
    }
}

4th

using (MemoryStream ms = new MemoryStream())
{
    await zipOutputBlob.DownloadToStreamAsync(ms);
    using (DeflateStream decompressionStream = new DeflateStream(ms, CompressionMode.Decompress))
    {
        CloudBlockBlob unzippedBlob = container.GetBlockBlobReference(String.Format("{0:yyyy-MM-dd}", lastWriteTime) + " " + Path.GetFileNameWithoutExtension(fileName) + ".csv");
        unzippedBlob.Properties.ContentType = "text/csv";
        using (Stream outputFileStream = await unzippedBlob.OpenWriteAsync())
        {
            await decompressionStream.CopyToAsync(outputFileStream);
            outputFileStream.Flush();
        }
    }
}

5th

using (var inputStream = new MemoryStream())
{
    await zipOutputBlob.DownloadToStreamAsync(inputStream);
    inputStream.Seek(0, SeekOrigin.Begin);

    using (var gzStream = new GZipStream(inputStream, CompressionMode.Decompress))
    {
        using (var outputStream = new MemoryStream())
        {
            gzStream.CopyTo(outputStream);
            byte[] outputBytes = outputStream.ToArray(); // No data. Sad panda. :'(
            string output = Encoding.ASCII.GetString(outputBytes);
            CloudBlockBlob unzippedBlob = container.GetBlockBlobReference(String.Format("{0:yyyy-MM-dd}", lastWriteTime) + " " + Path.GetFileNameWithoutExtension(fileName) + ".csv");
            unzippedBlob.Properties.ContentType = "text/csv";
            await unzippedBlob.UploadTextAsync(output);
        }
    }
}

6th

using (var ms = new MemoryStream())
{
    await zipOutputBlob.DownloadToStreamAsync(ms);
    ms.Seek(0, SeekOrigin.Begin);

    using (DeflateStream decompressionStream = new DeflateStream(ms, CompressionMode.Decompress))
    {
        using (var outputStream = new MemoryStream())
        {
            decompressionStream.CopyTo(outputStream);
            byte[] outputBytes = outputStream.ToArray(); // No data. Sad panda. :'(
            string output = Encoding.ASCII.GetString(outputBytes);
            CloudBlockBlob unzippedBlob = container.GetBlockBlobReference(String.Format("{0:yyyy-MM-dd}", lastWriteTime) + " " + Path.GetFileNameWithoutExtension(fileName) + ".csv");
            unzippedBlob.Properties.ContentType = "text/csv";
            await unzippedBlob.UploadTextAsync(output);
        }
    }
}

Option 5 and 6 fail with this error message on the CopyTo method:

System.Private.CoreLib: Exception while executing function: DefinitiveHealthCare. System.IO.Compression: The archive entry was compressed using an unsupported compression method.

How is this done?

Not sure if this would every be searching via Google, since closed as a duplicate, but I do think my solution would help someone in the future:

private static async Task UnzipDefinitiveFile(string fileName, CloudBlobContainer container, Logger logger, DateTime lastWriteTime, CloudBlockBlob zipOutputBlob)
{
    using (MemoryStream blobMemStream = new MemoryStream())
    {
        await zipOutputBlob.DownloadToStreamAsync(blobMemStream);
        using (ZipArchive archive = new ZipArchive(blobMemStream))
        {
            foreach (ZipArchiveEntry entry in archive.Entries)
            {
                logger.Send(SeverityLevel.Verbose, $"Now processing {entry.FullName}");
                if (entry.FullName != Path.GetFileNameWithoutExtension(fileName) + ".csv")
                    continue;
                string validName = String.Format("{0:yyyy-MM-dd}", lastWriteTime) + " " + Path.GetFileNameWithoutExtension(fileName) + ".csv";
                CloudBlockBlob blockBlob = container.GetBlockBlobReference(validName);
                blockBlob.Properties.ContentType = "text/csv";
                using (var fileStream = entry.Open())
                {
                    await blockBlob.UploadFromStreamAsync(fileStream);
                }
            }
        }
    }
}

The change here is using the ZipArchive class and looping through all the elements in the ZipArchive. In my case there was only one file in the ZipArchive, so I was trying to skip this step which did not work out well.

Also another option could be to just use Azure Data Factory V2. The copy data activity can load a zipped file and it can download a file from an SFTP location.

user2197446
  • 1,065
  • 3
  • 15
  • 31
  • 1
    Does this answer your question? [How do you unzip a gz file in memory using GZipStream?](https://stackoverflow.com/questions/42817059/how-do-you-unzip-a-gz-file-in-memory-using-gzipstream) – madreflection Jan 16 '20 at 21:36
  • I'm using .NET core 2.2 and Azure. Pulling the file out of the Azure container and then unzipping it within a memory stream and then writing it out makes this just different enough to warrant a separate question. Plus the link you gave me when tried in my situation shows an unsupported compression method. I've found multiple questions asking similar unzip questions, but not with all these variables. – user2197446 Jan 17 '20 at 18:22

0 Answers0