I attempted a binary search anyway, eventhough the file has not static line lengths.
First some considerations, then the code:
Sometimes it is needed, that the last n lines of a log file are extracted, based on an ascending sort key at the beginning of the line. The key really could be anything, but in log files typically represents a date-time, usually in the format YYMMDDHHNNSS (possibly with some interpunction).
Log files typically are text based files, consisting of multiple lines, at times millions of them. Often log files feature fixed-length line widths, in which case a specific key is quite easy to access with a binary search. However, probably also as often, log files have a variable line width. To access these, one can use an estimate of an average line width in order to calculate a file position from the end, and then process from there sequentially to the EOF.
But one can employ a binary approach also for this type of files, as demonstrated here. The advantage comes in, as soon as file sizes grow. A log file's maximum size is determined by the file system: NTFS allows for 16 EiB (16 x 2^60 B), theoretically; in practice under Windows 8 or Server 2012, it's 256 TiB (256 x 2^40 B).
(What 256 TiB actually means: a typical log file is designed to be readable by a human and rarely exceeds many more than 80 characters per line. Let's assume your log file logs along happily and completely uninterrupted for astonishing 12 years for a total of 4,383 days at 86,400 seconds each, then your application is allowed to write 9 entries per millisecond into said log file, to eventually meet the 256 TiB limit in its 13th year.)
The great advantage of the binary approach is, that n comparisons suffice for a log file consisting of 2^n bytes, rapidly gaining advantage as the file size becomes larger: whereas 10 comparisons are required for file sizes of 1 KiB (1 per 102.4 B), there are only 20 comparisons needed for 1 MiB (1 per 50 KiB), 30 for 1 GiB (1 per 33⅓ MiB), and a mere 40 comparisons for files sized 1 TiB (1 per 25 GiB).
To the function. These assumptions are made: the log file is encoded in UTF8, the log lines are separated by a CR/LF sequence, and the timestamp is located at the beginning of each line in ascending order, probably in the format [YY]YYMMDDHHNNSS, possibly with some interpunction in between. (All of these assumptions could easily be modified and cared for by overloaded function calls.)
In an outer loop, binary narrowing is done by comparing the provided earliest date-time to match. As soon as a new position within the stream has been found binarily, an independent forward search is made in an inner loop to locate the next CR/LF-sequence. The byte after this sequence marks the start of the record's key being compared. If this key is larger or equal the one we are in search for, it is ignored. Only if the found key is smaller than the one we are in search for its position is treated as a possible condidate for the record just before the one we want. We end up with the last record of the largest key being smaller than the searched key.
In the end, all log records except the ultimate candidate are returned to the caller as a string array.
The function requires the import of System.IO.
Imports System.IO
'This function expects a log file which is organized in lines of varying
'lengths, delimited by CR/LF. At the start of each line is a sort criterion
'of any kind (in log files typically YYMMDD HHMMSS), by which the lines are
'sorted in ascending order (newest log line at the end of the file). The
'earliest match allowed to be returned must be provided. From this the sort
'key's length is inferred. It needs not to exist neccessarily. If it does,
'it can occur multiple times, as all other sort keys. The returned string
'array contains all these lines, which are larger than the last one found to
'be smaller than the provided sort key.
Public Shared Function ExtractLogLines(sLogFile As String,
sEarliest As String) As String()
Dim oFS As New FileStream(sLogFile, FileMode.Open, FileAccess.Read,
FileShare.Read) 'The log file as file stream.
Dim lMin, lPos, lMax As Long 'Examined stream window.
Dim i As Long 'Iterator to find CR/LF.
Dim abEOL(0 To 1) As Byte 'Bytes to find CR/LF.
Dim abCRLF() As Byte = {13, 10} 'Search for CR/LF.
Dim bFound As Boolean 'CR/LF found.
Dim iKeyLen As Integer = sEarliest.Length 'Length of sort key.
Dim sActKey As String 'Key of examined log record.
Dim abKey() As Byte 'Reading the current key.
Dim lCandidate As Long 'File position of promising candidate.
Dim sRecords As String 'All wanted records.
'The byte array accepting the records' keys is as long as the provided
'key.
ReDim abKey(0 To iKeyLen - 1) '0-based!
'We search the last log line, whose sort key is smaller than the sort
'provided in sEarliest.
lMin = 0 'Start at stream start
lMax = oFS.Length - 1 - 2 '0-based, and without terminal CRLF.
Do
lPos = (lMax - lMin) \ 2 + lMin 'Position to examine now.
'Although the key to be compared with sEarliest is located after
'lPos, it is important, that lPos itself is not modified when
'searching for the key.
i = lPos 'Iterator for the CR/LF search.
bFound = False
Do While i < lMax
oFS.Seek(i, SeekOrigin.Begin)
oFS.Read(abEOL, 0, 2)
If abEOL.SequenceEqual(abCRLF) Then 'CR/LF found.
bFound = True
Exit Do
End If
i += 1
Loop
If Not bFound Then
'Between lPos and lMax no more CR/LF could be found. This means,
'that the search is over.
Exit Do
End If
i += 2 'Skip CR/LF.
oFS.Seek(i, SeekOrigin.Begin) 'Read the key after the CR/LF
oFS.Read(abKey, 0, iKeyLen) 'into a string.
sActKey = System.Text.Encoding.UTF8.GetString(abKey)
'Compare the actual key with the earliest key. We want to find the
'largest key just before the earliest key.
If sActKey >= sEarliest Then
'Not interested in this one, look for an earlier key.
lMax = lPos
Else
'Possibly interesting, remember this.
lCandidate = i
lMin = lPos
End If
Loop While lMin < lMax - 1
'lCandidate is the position of the first record to be taken into account.
'Note, that we need the final CR/LF here, so that the search for the
'next CR/LF sequence following below will match a valid first entry even
'in case there are no entries to be returned (sEarliest being larger than
'the last log line).
ReDim abKey(CInt(oFS.Length - lCandidate - 1)) '0-based.
oFS.Seek(lCandidate, SeekOrigin.Begin)
oFS.Read(abKey, 0, CInt(oFS.Length - lCandidate))
'We're done with the stream.
oFS.Close()
'Convert into a string, but omit the first line, then return as a
'string array split at CR/LF, without the empty last entry.
sRecords = (System.Text.Encoding.UTF8.GetString(abKey))
sRecords = sRecords.Substring(sRecords.IndexOf(Chr(10)) + 1)
Return sRecords.Split(ControlChars.CrLf.ToCharArray(),
StringSplitOptions.RemoveEmptyEntries)
End Function