0

I followed the advice in this SO Question. It did not work for me. Here is my situation and my code associated to it

I have a very large list, it has 2.04M items in it. I read it into memory to sort it, and then write it to a .csv file. I have 11 .csv files that I need to read, and subsequently sort. The first iteration gives me a memory usage of just over 1GB. I tried setting the list to null. I tried calling the List.Clear() I also tried the List.TrimExcess(). I have also waited for GC to do its thing. By hoping that it would know that there are no reads or writes going to that list.

Here is my code that I am using. Any advice is always greatly appreciated.

foreach (var zone in zones)
{

    var filePath = string.Format("outputs/zone-{0}.csv", zone);

    var lines = new List<string>();

    using (StreamReader reader = new StreamReader(filePath))
    {

        var headers = reader.ReadLine();

        while(! reader.EndOfStream)
        {
            var line = reader.ReadLine();

            lines.Add(line);
        }

        //sort the file, then rewrite the file into inputs

        lines = lines.OrderByDescending(l => l.ElementAt(0)).ThenByDescending(l => l.ElementAt(1)).ToList();

        using (StreamWriter writer = new StreamWriter(string.Format("inputs/zone-{0}-sorted.csv", zone)))
        {
            writer.WriteLine(headers);
            writer.Flush();

            foreach (var line in lines)
            {
                writer.WriteLine(line);
                writer.Flush();
            }
        }

        lines.Clear();
        lines.TrimExcess();

    }
}
Community
  • 1
  • 1
Rijnhardt
  • 2,344
  • 5
  • 19
  • 27
  • 1
    And what is the problem? If the GC doesn't reclaim memory its because it doesn't need to. From the code you've posted there doesn't seem to be anything wrong. – InBetween Jan 04 '17 at 14:25
  • @InBetween the problem is an out of memory exception on the second pass through of the `foreach` – Rijnhardt Jan 04 '17 at 14:26
  • Are you running on a 32 bit environment? – InBetween Jan 04 '17 at 14:32
  • @InBetween The reason I believe that is happening is due to the memory allocated for the first iteration's list is not being collected. The GC is also not collecting the memory allocated to it when the next iteration starts. How can I move on from here? As far as I am aware, I am using a 64-bit environment. Let me confirm. – Rijnhardt Jan 04 '17 at 14:33
  • I am running a 64-bit environment – Rijnhardt Jan 04 '17 at 14:34
  • Could it be a [disk cache issue](http://stackoverflow.com/questions/383324/how-to-ensure-all-data-has-been-physically-written-to-disk)? – stuartd Jan 04 '17 at 14:36
  • Then its not a problem of the GC. In 32 bit environment you could hit this issue because the process space is actually pretty limited but not in 64; see [here](http://stackoverflow.com/a/1088044/767890). You could be hitting the object size limit (2 GB) but that would be very strange. If your using .NET 4.5 try the `gcAllowVeryLargeObjects` config setting mentioned in the linked answer just to see if the issue goes away. – InBetween Jan 04 '17 at 14:37
  • Let me try the ``` ``` – Rijnhardt Jan 04 '17 at 14:42
  • @InBetween I tried the `gcAllowVeryLargeObjects` in the `App.config` it did not work. here is a screenshot with the diagnostics on the side http://imgur.com/ZQ6sTEC – Rijnhardt Jan 04 '17 at 14:53
  • 1
    Don't put the items in a `List` in the first place. You're pointlessly creating a new list, and spending a huge amount of resources to create it, then you're doing nothing with it and just throwing it on the floor and then worrying about how much work it is to clean it up. Since you don't need it, don't create it in the first place. – Servy Jan 04 '17 at 15:44
  • You also shouldn't be constantly flushing your stream. It dramatically harms the performance by inhibiting its ability to properly buffer the data. – Servy Jan 04 '17 at 15:45
  • @Servy fair analogy. what other options do you recommend to sort 2.04M elements outside of a list? – Rijnhardt Jan 05 '17 at 02:15
  • @Rijnhardt You're not using a list to sort anything, you're using a LINQ method to sort the items, then you're putting them in a list, then you're ignoring the list and treating it just like you'd used the results of a LINQ operation except that you've used a ton of memory and CPU time putting all of those items into a list that you didn't need. If you actually used the list and sorted the items in that list it would be at least accomplishing *something*, but there's no real reason to use one at all. – Servy Jan 05 '17 at 03:17
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/132379/discussion-between-rijnhardt-and-servy). – Rijnhardt Jan 05 '17 at 11:13

1 Answers1

-2

Try putting the whole thing in a using:

using (var lines = new List<string>())
{ ... }

Although I'm not sure about the nested usings.

Instead, where you have lines.Clear;, add lines = null;. That should encourage the garbage collector.

Peter Bill
  • 508
  • 3
  • 12
  • `List` doesn't implement `IDisposable`, so this won't compile. It has no unmanaged resources. – Servy Jan 04 '17 at 15:42