I have a foreach loop that adds files from a directory into a zip archive. It runs fine for a while, but if there are too many files going into one archive, the memory usage continues to rise for a long time until eventually I receive SystemOutOfMemory Exceptions and it all falls over. Here is the relevant snippet of script:
//there is a load of script before this, but the important declarations look like this :
public class MyFileInfo
{
public string FileName { get; set; }
public DateTime ReferenceDate { get; set; }
public string ArchiveLocation { get; set; }
public string ArchiveName { get; set; }
public string ArchiveBackupLocation { get; set; }
}
public class ArchiveFiles
{
public string Name { get; set; }
public HashSet<MyFileInfo> FileList { get; set; }
}
var distinctArchives = new Dictionary<string, MyTools.ArchiveFiles>();
//----------------------------------------------------
//-- The loop that is causing problems is below here--
//----------------------------------------------------
TotalFileCount = AllFilesToArchive.Count();
TotalArchiveCount = distinctArchives.Count();
CurrentFileCount = 0;
foreach (var archive in distinctArchives)
{
//reset the file count to zero before we start on this new archive. note that the currentfilecount is outside the loop and will not get reset.
FilesProcessedIntoThisArchive = 0;
TotalFilesInThisArchive = archive.Value.FileList.Count();
CurrentArchiveName = archive.Key;
string currentBackupName = archive.Value.Name;
//make folders for the archives to go in
if (!Directory.Exists(Path.GetDirectoryName(archive.Key)))
{
BasicTools.CreateFolderWithParentsAccess(Path.GetDirectoryName(archive.Key));
}
CreateEmptyArchive_atSource(CurrentArchiveName);
//HERE IS THE REAL PROBLEM The memory builds up within these "using" groups
using (FileStream zipToOpen = new FileStream(CurrentArchiveName, FileMode.Open))
{
using (ZipArchive archiveZip = new ZipArchive(zipToOpen, ZipArchiveMode.Update))
{
foreach (var file in archive.Value.FileList)
{
//if the user pressed abort on the progress dialog, then we need to stop this before we start any more files into the archive
if (bgWorker.CancellationPending)
{
break;
}
//we gotta set some of our values into properties for use of the progress bar
FilesProcessedIntoThisArchive++;
CurrentFileCount++;
CurrentFileName = file.FileName;
decimal progressAmount = (CurrentFileCount / TotalFileCount) * 100;
try
{
//so we already have the archive, we just need to add the file to the archive.
ZipArchiveEntry entry = archiveZip.CreateEntryFromFile(SourceDirectory + CurrentFileName, CurrentFileName, CompressionLevel.Optimal);
}
catch (Exception ex)
{
MessageBox.Show("Problem adding the file to the archive: " + ex.Message);
continue;
}
//if we have been asked to delete the files, we better do it.
if (DeleteFiles)
{
File.Delete(SourceDirectory + @"\" + CurrentFileName);
}
//tell the user the latest news
bgWorker.ReportProgress((int)progressAmount);
}
}
}
}
FilesProcessedCount = (int)CurrentFileCount;
The problem is that the memory for that archive is not disposed until the foreach (var file in archive.Value.FileList)...
loop is completed - if this loop has a lot of iterations then the memory usage becomes too large.
I tried swapping the nesting around so that the using (FileStream zipToOpen...
and using (ZipArchive...
bits are inside the foreach loop, meaning the memory is disposed after each iteration, but while that fixed the problem, it made the performance so bad that the app was unusable.
Ideally I would like a method to batch the files into groups of int batchSize
counts, and clear down the memory for the archive after each batch, but I can't see how to pick up from where I got to in the foreach loop of files if I do this... Do I have to change to using a while loop to do this..?
All the suggestions I have found online say that I should be using the "using" structure to dispose memory, but in this case that is what is letting me down - I need a sort of half way house where I can go through a set number of foreach iterations, then exit the "using" structure for a moment before diving back in and resuming the foreach loop... I'm a bit confused.
What do you think I should do?