2

I have a foreach loop that adds files from a directory into a zip archive. It runs fine for a while, but if there are too many files going into one archive, the memory usage continues to rise for a long time until eventually I receive SystemOutOfMemory Exceptions and it all falls over. Here is the relevant snippet of script:

//there is a load of script before this, but the important declarations look like this :
public class MyFileInfo
{
    public string FileName { get; set; }
    public DateTime ReferenceDate { get; set; }
    public string ArchiveLocation { get; set; }
    public string ArchiveName { get; set; }
    public string ArchiveBackupLocation { get; set; }
}
public class ArchiveFiles
{
    public string Name { get; set; }
    public HashSet<MyFileInfo> FileList { get; set; }
}

var distinctArchives = new Dictionary<string, MyTools.ArchiveFiles>();

//----------------------------------------------------
//-- The loop that is causing problems is below here--
//----------------------------------------------------

TotalFileCount = AllFilesToArchive.Count();
TotalArchiveCount = distinctArchives.Count();
CurrentFileCount = 0;

foreach (var archive in distinctArchives)
{
    //reset the file count to zero before we start on this new archive. note that the currentfilecount is outside the loop and will not get reset.
    FilesProcessedIntoThisArchive = 0;
    TotalFilesInThisArchive = archive.Value.FileList.Count();
    CurrentArchiveName = archive.Key;
    string currentBackupName = archive.Value.Name;

    //make folders for the archives to go in
    if (!Directory.Exists(Path.GetDirectoryName(archive.Key)))
    {
        BasicTools.CreateFolderWithParentsAccess(Path.GetDirectoryName(archive.Key));
    }

    CreateEmptyArchive_atSource(CurrentArchiveName);

//HERE IS THE REAL PROBLEM The memory builds up within these "using" groups
    using (FileStream zipToOpen = new FileStream(CurrentArchiveName, FileMode.Open))
    {
        using (ZipArchive archiveZip = new ZipArchive(zipToOpen, ZipArchiveMode.Update))
        {
            foreach (var file in archive.Value.FileList)
            {
                //if the user pressed abort on the progress dialog, then we need to stop this before we start any more files into the archive
                if (bgWorker.CancellationPending)
                {
                    break;
                }
                //we gotta set some of our values into properties for use of the progress bar
                FilesProcessedIntoThisArchive++;
                CurrentFileCount++;
                CurrentFileName = file.FileName;
                decimal progressAmount = (CurrentFileCount / TotalFileCount) * 100;

                try
                {
                    //so we already have the archive, we just need to add the file to the archive.
                    ZipArchiveEntry entry = archiveZip.CreateEntryFromFile(SourceDirectory + CurrentFileName, CurrentFileName, CompressionLevel.Optimal);
                }
                catch (Exception ex)
                {
                    MessageBox.Show("Problem adding the file to the archive: " + ex.Message);
                    continue;
                }
                //if we have been asked to delete the files, we better do it.
                if (DeleteFiles)
                {
                    File.Delete(SourceDirectory + @"\" + CurrentFileName);
                }
                //tell the user the latest news
                bgWorker.ReportProgress((int)progressAmount);
            }
        }
    }
}
FilesProcessedCount = (int)CurrentFileCount;

The problem is that the memory for that archive is not disposed until the foreach (var file in archive.Value.FileList)... loop is completed - if this loop has a lot of iterations then the memory usage becomes too large.

I tried swapping the nesting around so that the using (FileStream zipToOpen... and using (ZipArchive... bits are inside the foreach loop, meaning the memory is disposed after each iteration, but while that fixed the problem, it made the performance so bad that the app was unusable.

Ideally I would like a method to batch the files into groups of int batchSize counts, and clear down the memory for the archive after each batch, but I can't see how to pick up from where I got to in the foreach loop of files if I do this... Do I have to change to using a while loop to do this..?

All the suggestions I have found online say that I should be using the "using" structure to dispose memory, but in this case that is what is letting me down - I need a sort of half way house where I can go through a set number of foreach iterations, then exit the "using" structure for a moment before diving back in and resuming the foreach loop... I'm a bit confused.

What do you think I should do?

High Plains Grifter
  • 1,357
  • 1
  • 12
  • 36
  • 2
    See this : https://stackoverflow.com/questions/29983151/out-of-memory-exception-while-updating-zip-in-c-net – PaulF Jul 17 '18 at 16:12
  • by the way, I read this (which looks like it could be relevant), but thought it was not relevant to my question... maybe wrongly... https://stackoverflow.com/questions/41378746/disposing-during-foreach – High Plains Grifter Jul 17 '18 at 16:13
  • The mentioned question has an answer, but not a direct mention of documentation: *"When you set the mode to Update, the underlying file or stream must support reading, writing, and seeking. The content of the entire archive is held in memory, and no data is written to the underlying file or stream until the archive is disposed."* https://msdn.microsoft.com/en-us/library/system.io.compression.ziparchivemode(v=vs.110).aspx – Sami Kuhmonen Jul 17 '18 at 16:14
  • As for the actual disposing, you don't need to use `using` always. You can open the stream etc and handle the closing/disposing in the end, for example with try/finally. This allows you to keep count of how many files you've handled and close/dispose and reopen the archive when you want in the middle. – Sami Kuhmonen Jul 17 '18 at 16:19
  • 1
    @SamiKuhmonen this sounds like a more efficient solution - so I can open the archive, foreach through the first 10K files or whatever, commit those to the archive file and repeat... I think that may be a good compromise of speed and memory. I'll see what I can do there and post if I can make it work – High Plains Grifter Jul 17 '18 at 16:28
  • I'm really struggling here... I tried putting all the `zipToOpen` stuff into a method which I could then call, running for a set number of steps before moving on, and removing each dictionary item on the way, but that gives errors as the dictionary changes while looping... I tried disposing the archive, but could not then continue to use it. I cannot see a way to submit the files to the archive without ending the loops. Do I need to use a sortedDictionary or something? – High Plains Grifter Jul 18 '18 at 15:04
  • I put all the `using Filestream` onwards script in a method and called that within a while loop, adding a new value to the dictionary called "HasBeenProcessed" and re-finding all the unprocessed files each time with a linq `.Where(o=>o.HasBeenProcessed==false)` clause. This seems to work although the memory usage still gets pretty large..! – High Plains Grifter Jul 19 '18 at 07:31

0 Answers0