I have a large file >200MB. The file is an CSV-file from an external party, but sadly I cannot just read the file line by line, as \r\n
is used to define a new line.
Currently I am reading in all the lines using this approach:
var file = File.ReadAllText(filePath, Encoding.Default);
var lines = Regex.Split(file, @"\r\n");
for (int i = 0; i < lines.Length; i++)
{
string line = lines[i];
...
}
How can I optimize this? After calling ReadAllText on my 225MB file, the process is using more than 1GB RAM. Is it possible to use a streaming approach in my case, where I need to split the file using my \r\n
pattern?
EDIT1:
Your solutions using the File.ReadLines and a StreamReader will not work, as it sees each line in the file as one line. I need to split the file using my \r\n
pattern. Reading the file using my code results in 758.371 lines (which is correct), whereas a normal line counts results in more than 1.5 million.
SOLUTION
public static IEnumerable<string> ReadLines(string path)
{
const string delim = "\r\n";
using (StreamReader sr = new StreamReader(path))
{
StringBuilder sb = new StringBuilder();
while (!sr.EndOfStream)
{
for (int i = 0; i < delim.Length; i++)
{
Char c = (char)sr.Read();
sb.Append(c);
if (c != delim[i])
break;
if (i == delim.Length - 1)
{
sb.Remove(sb.Length - delim.Length, delim.Length);
yield return sb.ToString();
sb = new StringBuilder();
break;
}
}
}
if (sb.Length>0)
yield return sb.ToString();
}
}