4

I have a list of files that contain either of the two strings:

"stuff" or ";stuff"

I'm trying to write a PowerShell Script that will return only the files that contain "stuff". The script below currently returns all the files because obviously "stuff" is a substring of ";stuff"

For the life of me, I cannot figure out how to only matches file that contain "stuff", without a preceding ;

Get-Content "C:\temp\list\list.txt" |
  Where-Object { Select-String -Quiet -Pattern "stuff" -SimpleMatch $_ }

Note: C:\temp\list\list.txt contains a list of file paths that are each passed to Select-String.

Thanks for the help.

mklement0
  • 382,024
  • 64
  • 607
  • 775
Jay Schwegler
  • 125
  • 1
  • 5
  • 1
    How about a regex? `Select-String -Pattern '[^;]stuff'` – lit Feb 28 '19 at 18:37
  • 1
    If your goal is to select patterns that are `stuff` literally and not `;stuff`, you can use a negated set (don't use `-SimpleMatch`): `-Pattern '[^;]stuff'` – Maximilian Burszley Feb 28 '19 at 18:37
  • using `[^;]stuff` produces zero results and I'm not sure why as this does make sense. After playing around `^stuff` does produce the result that I am looking for, but I still don't know why one works vs. the other. – Jay Schwegler Feb 28 '19 at 21:24
  • An alternative is `[^;]?`, but a negative lookbehind is the right tool for the job. – Maximilian Burszley Feb 28 '19 at 22:10
  • @TheIncorrigible1: `[^;]?stuff` is tempting too, but also doesn't work: `';stuff' -match '[^;]?stuff'` -> `$true` – mklement0 Feb 28 '19 at 23:03

3 Answers3

3

You cannot perform the desired matching with literal substring searches (-SimpleMatch).

Instead, use a regex with a negative look-behind assertion ((?<!..)) to rule out stuff substrings preceded by a ; char.: (?<!;)stuff

Applied to your command:

Get-Content "C:\temp\list\list.txt" | 
  Where-Object { Select-String -Quiet -Pattern '(?<!;)stuff' -LiteralPath $_ }

Regex pitfalls:

  • It is tempting to use [^;]stuff instead, using a negated (^) character set ([...]) (see this answer); however, this will not work as expected if stuff appears at the very start of a line, because a character set - whether negated or not - only matches an actual character, not the start-of-the-line position.

  • It is then tempting to apply ? to the negated character set (for an optional match - 0 or 1 occurrence): [^;]?stuff. However, that would match a string containing ;stuff again, given that stuff is technically preceded by a "0-repeat occurrence" of the negated character set; thus, ';stuff' -match '[^;]?stuff' yields $true.

Only a look-behind assertion works properly in this case - see regular-expressions.info.

mklement0
  • 382,024
  • 64
  • 607
  • 775
1

To complement @mklement0's answer, I suggest an alternative approach to make your code easier to read and understand:

#requires -Version 4
@(Get-Content -Path 'C:\Temp\list\list.txt').
    ForEach([IO.FileInfo]).
    Where({ $PSItem | Select-String -Pattern '(?<!;)stuff' -Quiet })

This will turn your strings into objects (System.IO.FilePath) and utilizes the array functions ForEach and Where for brevity/conciseness. Further, this allows you to pipe the paths as objects which will be accepted by the -Path parameter into Select-String to make it more understandable (I find long lists of parameter sets difficult to read).

Maximilian Burszley
  • 18,243
  • 4
  • 34
  • 63
  • Thanks. I was initially playing with file-info input as well, though in the end I think simply changing `$_` to `-LiteralPath $_` would have made the OP's code clearer. You may have chosen the `@(...)` for conceptual clarity, but let me point that `.ForEach()` and `.Where()` work even on scalars. – mklement0 Feb 28 '19 at 22:14
  • @mklement0 do they? I've had issues with that assumption where exceptions were thrown for being non-arrays, but that might've been some edge case or incorrect usage – Maximilian Burszley Feb 28 '19 at 22:15
  • AFAIK, yes: `'foo'.foreach({ "[$_]" })`, `(1).foreach({ "[$_]" })`. Can you give an example where it doesn't work? – mklement0 Feb 28 '19 at 22:18
  • @mklement0 Not off the top of my head. It bit me once so I moved to using `@( )` wherever I need or want `ForEach`/`Where` – Maximilian Burszley Feb 28 '19 at 22:19
  • 2
    I see; `@(...)` certainly doesn't hurt, but it would be good to know whether use on scalars can be relied upon; if you do find an exception, please file an issue on GitHub. – mklement0 Feb 28 '19 at 22:20
-1

The example code posted won't actually run, as it will look at each line as the -Path value.

What you need is to get the content, select the string you're after, then filter the results with Where-Object

Get-Content "C:\temp\list\list.txt" | Select-String -Pattern "stuff" | Where-Object {$_ -notmatch ";stuff"}

You could create a more complex regex if needed, but depends on what your result data from your files looks like

trebleCode
  • 2,134
  • 19
  • 34
  • `list.txt` is a _list of files_ whose content should be searched. Your command searches the list itself instead. – mklement0 Feb 28 '19 at 22:16