0

I have a very large log file. I need to find out the last "WARN" line in that file effeciently (ie: read from the end), parse it, and return it as an object with "Date" field (DateTime type), "Level" field, and "Description" field

Any suggestions?

Here's what the file looks like

[Mon Dec 14 14:57:53 2015] [notice] Child 6180: Acquired the start mutex.
[Mon Dec 14 14:57:53 2015] [notice] Child 6180: Starting 150 worker threads.
[Mon Dec 14 15:04:43 2015] [warn] pid file C:/Program Files (x86)/Citrix/XTE/logs/xte.pid overwritten -- Unclean shutdown of previous Apache run?
[Mon Dec 14 15:04:43 2015] [notice] Server built: May 27 2011 16:04:42
[Mon Dec 14 15:04:43 2015] [notice] Parent: Created child process 5608

EDIT: This command must look inside the file, find the last matching line by search criteria, return that line, and "stop". Possible duplicate question is different in a number of ways: my script cannot simply sit there and wait for line to appear - it needs to run, get the line as quickly as possible, and get out. Furthermore, it needs to search for it by substring, and lastly it needs to return a DateTime and other fields broken up. Thanks for not voting to close this quesiton.

Igorek
  • 15,716
  • 3
  • 54
  • 92
  • Possible duplicate of [Unix tail equivalent command in Windows Powershell](http://stackoverflow.com/questions/4426442/unix-tail-equivalent-command-in-windows-powershell) – Jeroen Mostert May 11 '16 at 17:28
  • It is not equivalent by any means. I need to find the last matching line based on a search criteria, not get the last lines. Also, need to parse out DateTime. Please do not vote to close – Igorek May 11 '16 at 17:35
  • In general, SO is a place to get help with code you've written that isn't working. It is not a place to ask for a script to be written for you. I'm surprised that someone with 10,000+ reputation would post this! – Tony Hinkle May 11 '16 at 17:56
  • The only substantial difference with [the other question](http://stackoverflow.com/questions/4426442/unix-tail-equivalent-command-in-windows-powershell) is that you need to filter based on contents -- breaking up the fields is a trivial exercise compared to quickly scanning the file in reverse. Nevertheless, that *is* a substantial difference. – Jeroen Mostert May 11 '16 at 18:14
  • @TonyHinkle Appreciate, the comment. I made fibble attempts at writing PowerShell script, they all failed and since I dont know PowerShell at all, I was too embarassed to post those ;) – Igorek May 11 '16 at 19:08

3 Answers3

0

Open the file as a raw Stream, seek a "decent" block size from the end (say 1 MB), then search the resulting bytes for the binary representation of "warn" until you've found the last instance (I'm assuming you know the encoding in advance). If you find it, scan for the line terminators. If you don't find it, seek back 1 + 1 MB and go again. Repeat until you seek to the beginning.

If there is no "warn" in the entire file, this will be slower than just reading the file sequentially, but if you're certain there's a line of the kind you want near the end, this can terminate pretty quickly. The essential thing to do is not read the file as text with a StreamReader, since you lose the ability to seek arbitrarily.

Actually getting the code for this idea right is more involved. The difficulty of this operation is not due to anything in PowerShell -- there is no simple way to do this in any language, because reading a file in reverse is not an efficient operation in any file system I know of.

Jeroen Mostert
  • 27,176
  • 2
  • 52
  • 85
0

I'd approach that this way:

get-content $file -ReadCount 3000 |
 ForEach-Object {
  if ($_ -like '*warn*')
    {$Lastfound = $_}
 }

 ($Lastfound -like '*warn*')[-1]
Bacon Bits
  • 30,782
  • 5
  • 59
  • 66
mjolinor
  • 66,130
  • 7
  • 114
  • 135
-1

It's certainly not going to be efficient. Everything in PowerShell and C# (and everything else) is built around reading forwards, not backwards. Given that and the fact that you don't even know where the last line might be, I don't see any way to avoid processing the whole file unless you want to spend several hours writing your own ReverseStreamReader.

Assuming the file is bigger than RAM -- which makes Get-Content impractical, IMO -- I'd probably do something like:

$LineNumber = [uint64]0;
$StreamReader = New-Object System.IO.StreamReader -ArgumentList "C:\LogFile.log"
$SearchPattern = [Regex]::Escape('[warn]');
while ($Line = $StreamReader.ReadLine()) {
    $LineNumber++;
    if ($Line -match $SearchPattern) {
        $LastLineNumber = $LineNumber;
        $LastLineMatch = $Line;
    }
}
$StreamReader.Close()

$LastLineNumber
$LastLineMatch

Parsing the line is probably going to involve a lot of String.IndexOf() and String.Substring(). Turning the date into a DateTime should be done like so:

[datetime]::ParseExact('Mon Dec 14 15:04:43 2015','ddd MMM dd HH:mm:ss yyyy',[System.Globalization.CultureInfo]::InvariantCulture,[System.Globalization.DateTimeStyles]::None);

I chose -match over -like because as far as I can tell it actually performs better. That might be just my system, however.

Bacon Bits
  • 30,782
  • 5
  • 59
  • 66