Powershell3: discern and display last n Lines from an ascii file

Question

I think this should be simple. I write the logging output of xcopy to a plain text file, with a daily delimiter (literally) "++++++++++++++++++++Tue 07/03/2018 0900 PM" appended to the log file prior to each daily backup. So the last lines in the file typically look like this:

A new day appends a new delimiter line and so on.

I want to display the LAST delimiter and the lines which follow it to eof.

The schema I've tried GET-Content, Select-String -Context 0,20 don't work,

PS says my search string ++++++++++++++++++++ isn't a regular expression, doesn't recognize path etc. etc. Any help?

Memory and time are not at issue. Sorry if this is too simple.

Know I'm not supposed to, but I must Thank you Guyz for the excellent directions and comments. Well done. — fblackstone, Jul 05 '18 at 21:22

mklement0 · Accepted Answer · 2018-07-07T00:04:16.240

msjqu's helpful answer explains the need to escape + chars. as \+ in a regex in order for these chars. to be treated as literals.

Thus, the regex to match a header line - 20 + chars. at the start of a line (^) - is: ^\+{20}

That said, if it is sufficient to detect header lines by 20 + signs, Get-Content -Delimiter - which supports only literals as delimiters - offers a simple and efficient solution (PSv3+; assumes input file some.log in the current directory ./):

 $headerPrefix = '+' * 20  # -> '++++++++++++++++++++'
 $headerPrefix + (Get-Content ./some.log -Delimiter $headerPrefix -Tail 1)

-Delimiter uses the specified header-line signature to break the file into "lines" (text between instances of the delimiter, which are blocks of lines here) and -Tail 1 returns the last "line" (block) by searching for it from the end of the file. ^{Tip of the hat to mjsqu for helping me arrive at this solution.}

The following alternative solutions are regular-expression-based, which enables more sophisticated header-line matching.

Note: While none of the solutions below require reading the log file into memory as a whole, they do read through the entire file, not just from the end.

We can use this in a switch -regex -file statement to process all lines of the log file in order to collect the lines that start with and follow the last ^\+{20} match; the code assumes input file path ./some.log:

# Process all lines in the log file and 
# collect each block's lines along the way in 
# array $lastBlockLines, which means that after 
# all lines have been processed, $lastBlockLines contains
# the *last* block's lines.
switch -regex -file ./some.log {
  '^\+{20}' { $lastBlockLines = @($_) } # start of new block, (re)initialize array
  default   { $lastBlockLines += $_ }   # add line to block
}

# Output the last block's lines.
$lastBlockLines

Alternatively, if you're willing to assume a fixed maximum number of lines in a block, a single-pipeline solution using Select-String is possible:

Select-String '^\+{20}' ./some.log -Context 0,100 | Select-Object -Last 1 | 
  ForEach-Object { $_.Line; $_.Context.PostContext }

Select-String '^\+{20}' ./some.log -Context 0,100 matches all header lines in file ./some.log and, thanks to -Context 0, 100, includes (up to) 100 lines that follow a matching line in the match object that is emitted (the 0 means that no lines that precede a matching line are to be included).
Select-Object -Last 1 passes only the last match on.
ForEach-Object { $_.Line; $_.Context.PostContext } then outputs the last match's matching line as well as the up to 100 lines that follow it.

If you don't mind reading the file twice, you can combine Select-String with Get-Content ... | Select-Object -Skip:

Get-Content ./some.log | Select-Object -Skip (
    (Select-String '^\+{20}' ./some.log | Select-Object -Last 1).LineNumber - 1
  )

This takes advantage of the fact that the match objects emitted by Select-String have a .LineNumber property reflecting the number of the line on which a given match was found. Passing the last match's line number minus 1 to Get-Content ... | Select-Object -Skip then outputs the matching line as well as all subsequent ones.

I was playing around with a solution which involved reading the file from the end, until "+++++" and then stopping and regurgitating. Didn't quite get there though. I think that'd be the quickest way, if the file was absolutely huge. — mjsqu, Jul 05 '18 at 21:36
@mjsqu: Indeed, reading in chunks from the end of the file until a header line is detected would indeed be the fastest approach with large files, but that requires significantly more effort and direct use of the .NET framework. — mklement0, Jul 06 '18 at 01:45
Repeat this: `get-content bigfile.txt -tail $i` incrementing `$i` until header detected? — mjsqu, Jul 06 '18 at 02:08
@mjsqu: Promising in that it doesn't require reading the entire file, but does involve repeated `Get-Content` invocations that each have to open the file and search from the end, with each invocation duplicating the effort of the previous one. However, you can combine `-Delimiter ('+' * 20)` with `-Tail 1`, which only requires a single, read-from-the-end invocation - please see my update. — mklement0, Jul 06 '18 at 03:25
mklement0's first suggestion worked fast and true. All I had to do was remove the ./ in front of some.log. Combining -Tail command with -Delimiter . . . Brilliant! — fblackstone, Jul 06 '18 at 23:16
I've not been successful researching this part of your first suggestion. $headerPrefix = '+' * 20 # -> '++++++++++++++++++++' Could you explain the operators # -> , that is "hashtag hyphen greaterthan" Thanks. — fblackstone, Jul 09 '18 at 21:07
@fblackstone: `#` is the to-the-end-of-the-line comment character in PowerShell, meaning that everything that follows it on the same line is a _comment_. PowerShell lets you use `*` for string replication so that `'+' * 20` creates a string composed of 20 `+` characters - the comment is simply meant to illustrate that fact by printing the resulting string; the `->` as part of the comment is simply informal shorthand to indicate a return value (output). — mklement0, Jul 09 '18 at 23:42

score 1 · Answer 2 · edited Jul 05 '18 at 13:33

TLDR; Escape the + in your search, use "\+\+\+" etc.

Background

Unfortunately + is a reserved character in the world of regular expressions.

What is the meaning of + in a regex?

It tells the engine to match the previous search operator (either a character, range or code representing a group of chars like \d - digits) one or more times. You can see more information about this error in Powershell by running the following:

[regex]$x = "++++"

Returns:

Cannot convert value "++++" to type "System.Text.RegularExpressions.Regex". Error: "parsing "++++" - Quantifier {x,y} following nothing."
At line:1 char:1
+ [regex]$x = "++++"
+ ~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : MetadataError: (:) [], ArgumentTransformationMetadataException
    + FullyQualifiedErrorId : RuntimeException

It is saying the quantifier (+) is following nothing.

So we need to escape the + using \:

[regex]$x = "\+\+\+\+"

$x.Match('++++')

Returning the following, a non-erroring match:

Groups   : {0}
Success  : True
Name     : 0
Captures : {0}
Index    : 0
Length   : 4
Value    : ++++

Improvement

If you know how many + there are, you can match on "\+{20}", if there are 20. Or from the previous example:

[regex]$x = "\+{4}"

$x.Match('++++')

I will study regex and .match. Any suggestions for finding the last occurrence, and returning (from there to eof?). much appreciated. fb — fblackstone, Jul 05 '18 at 00:27
The RegEx quantifier `{}` can also be used in a form `\+{4,20}` meaning from 4 to 20 occurences, or if the second number is is left away any number. `\+{4,}` — , Jul 05 '18 at 12:04

score 1 · Answer 3 · answered Jul 05 '18 at 11:56

Another way using a RegEx to split the file into sections.

use Get-Content with the -Raw parameter to have one string, not an array of strings
use a nonconsuming positive lookahead to split the file into sections starting with
20*+ -split '(?=\+{20})' that are not empty -ne ''
use index [-1] to get the last section.

Sample output

PS> ((Get-Content '.\LogFile.txt' -raw) -split '(?=\+{20})' -ne '')[-1]
++++++++++++++++++++Mon 07/03/2018 0900 PM
0 Files(s) copied
 Xcopy SUCCEEDED K:\ to J:\MyUSBBackups Mon 07/02/2018 0900 PM
0 Files(s) copied
 Xcopy SUCCEEDED K:\ to J:\MyUSBBackups\OutlookBak Mon 07/02/2018 0900 PM

postanote · Answer 4 · 2018-07-05T04:21:21.953

Personally, I'd change that logging format so that it is more object friendly and use as normal.

However, based on what you have posted. Here is one way to go at this, I am sure there are more elegant ways, but this is q&d (quick and dirty.) Also, as a military vet (20+ years) and still live and work on military time, 0900 is 9:00 AM where as 2100 is 9:00 PM. 8^} ... Just saying …

# Get the lines in the file
($DataSet = Get-Content -Path '.\LogFile.txt')

# Results

++++++++++++++++++++Mon 07/02/2018 0900 PM
0 Files(s) copied
 Xcopy SUCCEEDED K:\ to J:\MyUSBBackups Mon 07/02/2018 0900 PM
0 Files(s) copied
 Xcopy SUCCEEDED K:\ to J:\MyUSBBackups\OutlookBak Mon 07/02/2018 0900 PM
++++++++++++++++++++Mon 07/03/2018 0900 PM
0 Files(s) copied
 Xcopy SUCCEEDED K:\ to J:\MyUSBBackups Mon 07/02/2018 0900 PM
0 Files(s) copied
 Xcopy SUCCEEDED K:\ to J:\MyUSBBackups\OutlookBak Mon 07/02/2018 0900 PM



 # Get the index of the LastDateEntry, using a string match (RegEx)
($LastDateEntry = (Get-Content -Path '.\LogFile.txt' | %{$_ | Select-String -Pattern '[+].*'}) | Select -Last 1)

# Results

++++++++++++++++++++Mon 07/03/2018 0900 PM


# Get the LastDateEntryIndex
($DateIndex = (Get-Content -Path '.\LogFile.txt').IndexOf($LastDateEntry))

# Results

5



 # Get the data using the index
ForEach($Line in $DataSet)
{
    If ($Line.ReadCount -ge $DateIndex)
    {
    Get-Content -Path '.\LogFile.txt' | Select-Object -Index ($Line.ReadCount)
    }
}

# Results

++++++++++++++++++++Mon 07/03/2018 0900 PM
0 Files(s) copied
 Xcopy SUCCEEDED K:\ to J:\MyUSBBackups Mon 07/02/2018 0900 PM
0 Files(s) copied
 Xcopy SUCCEEDED K:\ to J:\MyUSBBackups\OutlookBak Mon 07/02/2018 0900 PM

As written, I wouldn't call your solution quick (neither to write nor to execute), but it contains the seed of a much shorter, faster solution, which requires neither line-by-line iteration nor loading the file into memory as a whole (though it does involve reading the file _twice_): `$lineNo = (Select-String '[+].*' '.\LogFile.txt' | Select -Last 1).LineNumber; Get-Content '.\LogFile.txt' | Select -Skip ($lineNo - 1)` — mklement0, Jul 05 '18 at 15:24

Powershell3: discern and display last n Lines from an ascii file

4 Answers4

Background

Improvement