2

I'm resizing heaps of images to 1000x1000 thumbnails in parallel and running out of memory very quickly. (Performance profiler puts me at 3GB of used memory after about 3 minutes)

Originally I was using Image.FromFile() but doing some research, I found that Image.FromStream() is the way to go. I think I have the appropriate using statements, something somewhere is still keeping stuff in memory and the GC isn't clearing resources as expected.

It seems like there's an issue with GDI+ keeping the handles open, but I can't seem to find an appropriate solution for my case.

Questions:

  1. Am I doing something completely wrong?
  2. If not, is there a better way to Dispose() of the stream / image / ResizedImage so I'm not eating up all the resources, while still maintaining speedy parallel operations?
  3. If GDI+ is the problem and is keeping unmanaged resources alive, how do I correct the issue?

Code

List<FileInfo> files contains ~300 valid JPG images, each JPG ~2-4mb

Caller

    public void Execute()
    {
        Parallel.ForEach(Files, (file) =>
        {
            Resize.ResizeImage(file.FullName);
        }
        );
    }

Execute() calls a Parallel.Foreach()..

Resize Class

public static class Resize
{
    public static void ResizeImage(string fileName)
    {
        ResizeImage(fileName, 1000, 1000, true);
    }
            
    public static void ResizeImage(string fileName, int newHeight, int newWidth, bool keepAspectRatio = true)
    {
        string saveto = Path.GetDirectoryName(fileName) + @"\Alternate\" + Path.GetFileName(fileName);
        try
        {
            using (FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read))
            {
                using (Image ImageFromStream = Image.FromStream(fs))
                {
                    var ImageSize = new Size(newHeight, newWidth);
                    if (keepAspectRatio)
                    {
                        int oWidth = ImageFromStream.Width;
                        int oHeight = ImageFromStream.Height;
                        double pWidth = ((double)ImageSize.Width / (double)oWidth);
                        double pHeight = ((double)ImageSize.Height / (double)oWidth);
                        double percent;
                        if (pHeight < pWidth)
                            percent = pHeight;
                        else
                            percent = pWidth;
                        newWidth = (int)(oWidth * percent);
                        newHeight = (int)(oHeight * percent);
                    }
                    else
                    {
                        newWidth = ImageSize.Width;
                        newHeight = ImageSize.Height;
                    }
                    var ResizedImage = new Bitmap(newWidth, newHeight);
                    using (Graphics gfxHandle = Graphics.FromImage(ResizedImage))
                    {
                        gfxHandle.InterpolationMode = InterpolationMode.HighQualityBicubic;
                        gfxHandle.DrawImage(ImageFromStream, 0, 0, newWidth, newHeight);
                        if (!Directory.Exists(Path.GetDirectoryName(saveto))) { Directory.CreateDirectory(Path.GetDirectoryName(saveto)); }
                        ResizedImage.Save(saveto, ImageFormat.Jpeg);
                    }
                    ResizedImage.Dispose();
                    ResizedImage = null;
                }
            }
        }
        catch (Exception ex)
        {
            Debug.WriteLine(string.Format("Exception: {0}", ex.Message));
        }
    }
Community
  • 1
  • 1
Adam Vincent
  • 3,281
  • 14
  • 38
  • 1
    I do remember hearing about GDI+ not releasing resources properly, but it was a while ago so I'm sketchy on details. What you could do is force a Garbage Collection cycle using `GC.Collect()`, or otherwise limit the number of parallel resizes using explicit worker threads. Or maybe even both. – Zac Faragher Mar 30 '17 at 02:34
  • @ZacFaragher A `GC.Collect()` won't do anything in this case. I'm watching the performance profiler, and the garbage man is coming to pick stuff up, but it isn't at the curb. You are right about the number of parallel resizes though, you've just pointed me to the solution, thanks. – Adam Vincent Mar 30 '17 at 02:40

1 Answers1

3

This explanation of parallelism points out that my Parallel.ForEach() was basically creating an overabundance of new tasks because it was waiting on disk access. At about the 5 minute mark, and about when the exception was thrown, there were ~160 threads. Reducing the degree of parallelism limits the amount of threads created, and the number of images waiting in memory to finish loading or writing to the disk before falling out of scope and being disposed of. Setting MaxDegreeOfParallelism = 2 seemed to be the sweet spot for networked disk access and reduced my thread count to around 25, and increased CPU utilization to about 35% (Up from 17-24%, due to GC blocking threads, and CPU overhead from too many threads)

    public void Execute()
    {
        //Parallel.ForEach(Files, (file) =>
        //{
        //    Resize.ResizeImage(file.FullName);
        //}
        //);

        Parallel.ForEach(Files, new ParallelOptions() { MaxDegreeOfParallelism = 2 }, (file) => { Resize.ResizeImage(file.FullName); } );

    }

Thanks @ZacFaragher.

Community
  • 1
  • 1
Adam Vincent
  • 3,281
  • 14
  • 38