2

Today I noticed that a little program I made triggered GC quite often during the first 10~20 seconds of the programs lifetime. After which it barely triggers ever again. enter image description here

During this period of time only 1 function runs, which is the one below. Obtaining ~2k of file paths, and filtering out most of them.

 public static string[] FilterFiles(string path)
    {
        // Fetch the files from given directory
        var files = Directory.GetFiles(path);

        // Delete all files that are to small
        foreach (string file in files)
        {
            string fullFile = default(string);

            try
            {
                fullFile = File.ReadAllText(file);
            }
            catch
            {
                continue;
            }

            if (fullFile.Length < Settings.MinimumFileSize)
            {
                File.Delete(file);
            }
        }

        // Obtain the new list without the small files
        List<string> cleanFiles = new List<string>(Directory.GetFiles(path));
        List<string> cleanReturn = new List<string>(Directory.GetFiles(path));

        // Remove files we have handled before
        foreach (string file in cleanFiles)
        {
            if (File.Exists(Settings.ExtractFolder + "\\" + file.Substring(file.LastIndexOf('\\') + 1) + "_Extract.xml"))
            {
                cleanReturn.Remove(file);
            }
        }

        return cleanReturn.ToArray();
    }

Is it normal for GC to trigger this often in this period of time?

MX D
  • 2,453
  • 4
  • 35
  • 47

2 Answers2

7

Well, yes. You are creating tons of objects with a short lifetime, and those are disposed as soon as possible.

Try not to read the entire file. Instead, just get the FileInfo to get the file size.

Here you are enumerating the directory listing twice, which is unnecessary too:

List<string> cleanFiles = new List<string>(Directory.GetFiles(path));
List<string> cleanReturn = new List<string>(Directory.GetFiles(path));

Also here, a ton of strings are created due to string concatenation:

Settings.ExtractFolder + "\\" + file.Substring(file.LastIndexOf('\\') + 1) + "_Extract.xml"

Use a StringBuilder or string.Format there, and try to do as much in front as possible.

Community
  • 1
  • 1
Patrick Hofman
  • 153,850
  • 22
  • 249
  • 325
  • 1
    Not reading the entire file but using the FileInfo reduced it to a mere 3 GC's. Thanks. – MX D Feb 22 '16 at 18:16
1

You really don't need to read in an entire file just to find its length. Just do: long length = new FileInfo(file).Length;.

You can enumerate files without reading all the file names into an array too, by using Directory.EnumerateFiles(path).

I think you could rewrite your entire function like so:

public static IEnumerable<string> FilterFiles(string path)
{
    foreach (string file in Directory.EnumerateFiles(path))
    {
        if (new FileInfo(file).Length < Settings.MinimumFileSize)
            File.Delete(file);
        else if (!File.Exists(Settings.ExtractFolder + "\\" + file.Substring(file.LastIndexOf('\\') + 1) + "_Extract.xml"))
            yield return file;
    }
}

And then either use foreach to enumerate all the files like so:

foreach (string file in FilterFiles(myPath))
    ...

Or if you want to force all the small files to be deleted before you apply the rest of your logic, use ToArray() first, before the foreach:

foreach (string file in FilterFiles(myPath).ToArray())
    ...

But to answer your question: Yes, the GC will potentially run often if you create lots of small objects. And it will especially run if you create some big strings: How big are those files you're reading into memory?

Matthew Watson
  • 104,400
  • 10
  • 158
  • 276