We have one requirement to extract large size .zip files (around 3 - 4 GB size) in Blob Container to other Blob Container and the extracted files are Jason files (around 35 -50GB size).
For implementation been referred code from this link: https://msdevzone.wordpress.com/2017/07/07/extract-a-zip-file-stored-in-azure-blob/ and able to extract files lesser sizes 40MB unzipping to 400MB in few minutes but getting stuck more than an hour with 2 GB file sizes extracting to 30GB JSON files.
Could anyone suggest whether any better solution they come across this scenario not using file operations?
Please below code reference we worked on:
CloudBlockBlob blockBlob = container.GetBlockBlobReference(filename);
BlobRequestOptions options = new BlobRequestOptions();
options.ServerTimeout = new TimeSpan(0, 20, 0);
// Save blob(zip file) contents to a Memory Stream.
using (MemoryStream zipBlobFileStream = new MemoryStream())
{
//blockBlob.Properties.LeaseDuration
blockBlob.DownloadToStream(zipBlobFileStream, null, options);
zipBlobFileStream.Flush();
zipBlobFileStream.Position = 0;
//use ZipArchive from System.IO.Compression to extract all the files from zip file
using (ZipArchive zip = new ZipArchive(zipBlobFileStream, ZipArchiveMode.Read, true))
{
//Each entry here represents an individual file or a folder
foreach (var entry in zip.Entries)
{
//creating an empty file (blobkBlob) for the actual file with the same name of file
var blob = extractcontainer.GetBlockBlobReference(entry.FullName);
using (var stream = entry.Open())
{
//check for file or folder and update the above blob reference with actual content from stream
if (entry.Length > 0)
blob.UploadFromStream(stream);
}
}
}
}