I'm trying to parse big CSV file. To do that, I launch my function in a new thread and I do this :
using (StreamReader sr = new StreamReader(CurrentFilePath, Encoding.UTF8))
{
while (!sr.EndOfStream)
{
String strLine = sr.ReadLine();
String[] strFields = strLine.Split('\t');
//Processing my array
}
}
Nothing really unusual here. But I noticed that each private string created by String.Split
are kept in memory. So that when I'm parsing a file with X lines on Y columns, I have almost X*Y string still in memory (using .NetMemoryProfiler which also say that they have not been collected by GC).
Is it because it's launch in a different thread ? Any idea ?
-- EDIT -- I'm storing 20 of the 31 columns of my CSV in this class :
class InputEntry {
public String Field1 {get;set;}
public String Field2 {get;set;}
public String Field2 {get;set;}
...
}
If I load a 216Mo file (31 columns on 288000 lines), when I store each line in a list of my InputEntry
class, it takes 450Mo in memory even if the average string length is 37 chars.