1

I have a .NET Core 2.0 that is processing a list of over 30,000 XML files and seems to have a memory leak.

Right now all I have the program doing is reading one file at a time, adding that files contents to a List then moving on to the next file.

Nothing is happening with these files, but my program memory is rising .5 MB per file. The files themselves are anything between 5KB - 10KB.

Here is the process method.

public async Task ProcessModelFile<TModel>(List<string> dataFileList) where TModel : InstitutionModel
    {
        foreach (var file in dataFileList)
        {
            var xmlLoadFile = new XmlLoadFile<TModel>();
            xmlLoadFile.AddFile(file, rootElementName: $"ArrayOf{TextService.GetClassNameFromType<TModel>()}");
        }
    }

And the AddFile method and class:

internal class XmlLoadFile<TModel> where TModel : class
        {

            public List<TModel> ModelList { get; set; }

            internal void AddFile(string file, string rootElementName = "", bool HasHistoricalCutOff = false, DateTime? HistoricalCutOffDate = null)
            {
                using (XmlReader reader = XmlReader.Create(file, new XmlReaderSettings { CheckCharacters = false }))
                {
                    XmlSerializer serializer;

                    if (!string.IsNullOrWhiteSpace(rootElementName))
                    {
                        XmlRootAttribute rootElement = new XmlRootAttribute();
                        rootElement.ElementName = rootElementName;
                        serializer = new XmlSerializer(typeof(List<TModel>), rootElement);
                    }
                    else
                    {
                        serializer = new XmlSerializer(typeof(List<TModel>));
                    }

                    ModelList.AddRange((List<TModel>)serializer.Deserialize(reader));
                }
            }
        }

Every time the foreach loop runs, my memory increases .5MB and with the amount of files I have it goes over 15GB quickly. Debugging in VS and taking snapshots of the Memory it does not account for the memory that is being used.

Garbage Collection does not seem to be releasing this memory properly.

A. Hasemeyer
  • 1,452
  • 1
  • 18
  • 47
  • 1
    Before assuming there's a memory leak in XmlReader check *your* code. How many serializers are getting generated that could be reused? How many times does `ModelList` have to reallocate its internal buffer because it's full? XmlSerializers get cached so the runtime doesn't have to rebuild one when one is already available *only if the correct constructor is used though*. – Panagiotis Kanavos Oct 04 '18 at 16:18
  • if you are adding them to a list then why would the garbage collector collect them while list is in use? – Yahya Hussein Oct 04 '18 at 16:18
  • @YahyaHussein lists store data in arrays. When the array is full, a new one is create with twice the size and the old data are copied. That will result in a *lot* of orphaned objects whose size doubles each time. All those will have to be garbage collected at some point. – Panagiotis Kanavos Oct 04 '18 at 16:20
  • One way to reduce reallocations is to use the `capacity` parameter when creating the list to creat a big enough list to hold most of the expected data. This means you have to guess though. Another idea is to use a List> with a capacity equal to the number of files. Once processing is done, flatten the list eg with `SelectMany()` – Panagiotis Kanavos Oct 04 '18 at 16:22
  • The way this class is called you only need *one* serializer, whose root element is *already* known when you construct `XmlLoadFile`. Instead of passing that element as a parameter to `AddFile`, create two serializers in `XmlLoadFile`'s constructor and store them in a field: one with a root element and one without – Panagiotis Kanavos Oct 04 '18 at 16:24
  • Finally, use a profiler or BenchmarkDotNet to see what's actually going on with your code and what's actually leaking – Panagiotis Kanavos Oct 04 '18 at 16:24
  • @PanagiotisKanavos, I think the serializer was the issue. I added it to the class as a property and and passed the root in the initializer and no more unnecessary memory use. If you want to make your comment an answer I'll upvote it. – A. Hasemeyer Oct 04 '18 at 17:36

1 Answers1

0

XmlSerializers get cached so the runtime doesn't have to rebuild one when one is already available only if the correct constructor is used though.

A. Hasemeyer
  • 1,452
  • 1
  • 18
  • 47