To provide some context, I'm trying to optimize the below code which reads a file line by line, buffers those lines and saves to the database every 100 lines -
using (StreamReader sr = new StreamReader(fileName, Encoding.Default))
{
IList<string> list = new List<string>();
int lineCount = 0;
foreach (var line in sr.ReadLines((char)someEOL)) //ReadLines is an extension method that yield returns lines based on someEOL while reading character by character
{
list.Add(line); //Keeping it simple for this example. In the actual code it goes through a bunch of operations
if(++lineCount % 100 == 0) { //Will not work if the total number of lines is not a multiple of 100
SaveToDB(list);
list = new List<string>();
}
}
if(list.Count() > 0)
SaveToDB(list); //I would like to get rid of this. This is for the case when total number of lines is not a multiple of 100.
}
As you would notice, SaveToDB(list)
happens twice in the above code. It is needed the second time in case total number of lines % 100 != 0
(for example, if there are 101 lines, the if(lineCount % 100 == 0)
will miss the last one). It's not a huge bother but I'm wondering if I can get rid of it.
To that end, if I could read the total number of lines before getting in the foreach loop, I could write if(lineCount % 100 == 0)
differently. But finding the total number of lines requires going through the file character by character to count someEOL
which is a definite no because the file size can range from 5-20 GBs. Is there a way to do the count without a performance penalty (which seems doubtful to me but maybe there is a solution)? Or another way to rewrite it to get rid of that extra SaveDB(list)
call?