Question: What is the best way to parse files that are missing the new line character at the end of the file? Should I just use a try and catch on OutOfMemoryException? Or, is there a better way?
Background: I am parsing log files using StreamReader's Readline() method to read in the next line. So, the basic loop structure looks like this:
while ((line = sr.ReadLine()) != null)
{
// Parse the file
}
This works well, even on large files (i.e., > 2GB). But, when the next line is not null and does not contain a new line character then StreamReader just reads blank spaces until all memory is consumed and an OutOfMemoryException is thrown. Is this the best way to handle a missing new line character at the end of the file? Or, are there better ways of handling this problem?
Note: the file is being created from IIS Exchange Server. Without digging in with our IT group, the file appears to be cutoff mid-creation, resulting in the last row being bad as it is missing data.
Research: I found a posting on SO (see below) that refers to using File.ReadFile
. While it works on a much smaller file (i.e., < 2GB) that is missing the new line character, it still fails on large files (i.e., > 2GB).
https://stackoverflow.com/a/13416225
Edit
The compiler stops at the While line in the code sample below. The problem is not with the code, but with the file. I cannot post our log files. But, to demonstrate, create a few rows of data in NotePad++. For the last row of the file, remove the NewLine character and then run the file. StreamReader will blow up on the last row because it cannot find the end of the row.
Below is a copy of the log file with all data contents removed, with exception to the timestamp and the NewLine character at the end of each row. For the last row, I included the last data element (port number) before the data cuts off. Notice that the last row is missing the new line character?