0

I'm trying to parse big CSV file. To do that, I launch my function in a new thread and I do this :

 using (StreamReader sr = new StreamReader(CurrentFilePath, Encoding.UTF8))
 {
     while (!sr.EndOfStream)
     {
         String strLine = sr.ReadLine();
         String[] strFields = strLine.Split('\t');
         //Processing my array
      }
 }

Nothing really unusual here. But I noticed that each private string created by String.Splitare kept in memory. So that when I'm parsing a file with X lines on Y columns, I have almost X*Y string still in memory (using .NetMemoryProfiler which also say that they have not been collected by GC).

Is it because it's launch in a different thread ? Any idea ?

-- EDIT -- I'm storing 20 of the 31 columns of my CSV in this class :

    class InputEntry {
         public String Field1 {get;set;}
         public String Field2 {get;set;}
         public String Field2 {get;set;}
         ...
    }

If I load a 216Mo file (31 columns on 288000 lines), when I store each line in a list of my InputEntryclass, it takes 450Mo in memory even if the average string length is 37 chars.

SeyoS
  • 661
  • 5
  • 22
  • 1
    Are you storing those string arrays anywhere else which isn't a local variable? – Yuval Itzchakov Apr 15 '15 at 13:19
  • What does `//Processing my array` do with these strings? – spender Apr 15 '15 at 13:20
  • 4
    Do you have enough memory? If so, why should the garbage collector get active? – Tim Schmelter Apr 15 '15 at 13:20
  • I'm storing some of them only (there is 31 columns and I'm only storing 10 columns). There is too much of them still in memory. – SeyoS Apr 15 '15 at 13:20
  • @spender : It's just to show that I do some stuff right after. – SeyoS Apr 15 '15 at 13:21
  • 1
    Can you show us how you're storing on some of them? – Yuval Itzchakov Apr 15 '15 at 13:22
  • 1
    Is this causing you any issues? Are you sure the GC just hasn't gotten around to the strings yet? – Sayse Apr 15 '15 at 13:22
  • If you really have a problem with this, add a `GC.Collect()` just before the `}` and see if the memory is collected. – xanatos Apr 15 '15 at 13:28
  • Assuming OP is using .net memory profiler snapshots to make their measurements, it forces a collection before measuring. It should also be possible to show how the string instances are rooted/reachable using the profiling tool... – spender Apr 15 '15 at 13:50
  • @TrueBlueAussie Take a read: http://en.wikipedia.org/wiki/Comma-separated_values (in particular "separated by some other character or string, most commonly a literal comma or tab"). Hampered by lack of an official standard, it's not possible to make definitive statements about CSV! – spender Apr 15 '15 at 13:52
  • You should look at these "bad" strings in the profiler and figure out what's holding on to them. http://memprofiler.com/OnlineDocs/default.htm?turl=rootpathsinthegraph.htm – spender Apr 15 '15 at 13:55
  • @TrueBlueAussie CSV can also stand for character separated values. – Darren Young Apr 15 '15 at 13:57
  • @Darren Young: It can also mean Chicken Soaked Vertebrae, but I surrender. Comment removed :) – iCollect.it Ltd Apr 15 '15 at 13:58
  • @TrueBlueAussie Not really in this context though! :) Sincerely A. Pedant! – Darren Young Apr 15 '15 at 13:58
  • 1
    @Darren Young: That depends on the text content of the file. Might be all about them :> – iCollect.it Ltd Apr 15 '15 at 13:59

1 Answers1

2

This happens because garbage collection in .NET, by design, only happens after the fact. Garbage collection frees up memory, but it comes at the cost of CPU time. It is therefore not always feasible to collect early upfront, especially if memory is not in high demand at the time. Ultimately, .NET decides this.

You can, however, issue a call to cause garbage collection immediately, like so.

GC.Collect();

This is generally discouraged, though.

Find more information here:

GC.Collect (MSDN)

When is it acceptable to call GC.Collect?

Community
  • 1
  • 1
Biscuits
  • 1,767
  • 1
  • 14
  • 22
  • Assuming OP is taking snapshots, .NetMemoryProfiler forces a collection before measuring – spender Apr 15 '15 at 13:44
  • Thanks Biscuits but I look at the memory taken by my app after the process is done so GC is suppose to have already collect everything. I tried to force it witch `GC.Collect()` but the result was the same. – SeyoS Apr 16 '15 at 07:43