2

i am kinda stuck with this search for Regex string. The scenario is as follows:

  • I have a bunch of files of certain extension (*.tlt) with random content
  • All the files are across the subfolders of BETA folder on drive F:
  • Each one of the files has at least one Revision 1.234 somewhere in the content. (sometimes multiple times - only the first appearance is important)

This is what I have so far:

$files = gci f:\beta\ -Include "*.tlt" -Recurse
$results = $files |
           Select-String -Pattern 'Revision:.+.{1}[.]\d{1,3}'|
           ForEach-Object { $_.Matches } |
           select Value |
           Format-Table -GroupBy Filename

What I need is a PowerShell script that searches through the files and returns the list of files with the full path and ONLY the Revision 1.234 but not the whole line.

Ansgar Wiechers
  • 193,178
  • 25
  • 254
  • 328
Tominko
  • 35
  • 4
  • 2
    Well, you are currently only selecting `value` so you can't group by `Filename`. Also, you shouldn't pipe to `Format-Table` if you are going to store the results. Store the results, and then display them with `Format-Table` later if desired. If you are only using it in order to group things, use `Group-Object` instead. – TheMadTechnician Aug 21 '18 at 22:07
  • Thank you for your advice. I have now marked it and up-voted for both. – Tominko Aug 23 '18 at 07:50

2 Answers2

2

You were close, but you inevitably need to loop through your files. Note -Filter is significantly faster than -Include since it doesn't collect every object before filtering.

$fileList = Get-ChildItem -Path F:\beta -Filter *.tlt -Recurse
$results = foreach ($file in $fileList)
{
    $find = $file | Select-String -Pattern '(Revision:.+?\.\d{1,3})'
    if ($find)
    {
        @{
            Path = $file.FullName
            Rev  = $find.Matches.Groups[0].Value
        }
    }
}
Maximilian Burszley
  • 18,243
  • 4
  • 34
  • 63
2

A single-pipeline solution is possible with the help of calculated properties:

Get-ChildItem f:\beta -Filter *.tlt -Recurse | 
  Select-String -List -Pattern 'Revision:.+?\.\d{3}' |
    Select-Object @{ n='FullName'; e='Path' }, @{ n='Revision'; e={ $_.Matches.Value } } 

Sample output:

FullName                              Revision
--------                              --------
/Users/jdoe/foo.tlt                   Revision: 1.234
/Users/jdoe/sub/bar.tlt               Revision: 10.235
  • As mentioned in TheIncorrigible1's answer, using -Filter performs much better than using -Include, because -Filter filters at the source (lets the filesystem provider do the filtering) rather than collecting all file-info objects first and then letting PowerShell do the filtering.

  • Select-String -List limits matching in each input file to the first match.

  • Each match output by Select-String is a [Microsoft.PowerShell.Commands.MatchInfo] instance, which contains rich metadata about each match, such as .Path with the full input filename, and .Matches with information about what the regex (-Pattern) matched - this metadata is used to populate the output custom objects created by Select-Object via the aforementioned calculated properties.

mklement0
  • 382,024
  • 64
  • 607
  • 775