PowerShell uses strings to store regexes and has no syntax for regex literals such as /.../
- nor for post-positional matching options such as g
.
PowerShell is case-insensitive by default and requires opt-in for case-sensitivity (-CaseSensitive
in the case of Select-String
).
- Without that,
[A-Z]
is effectively the same as [A-Za-z]
and therefore matches both upper- and lowercase (English) letters.
The equivalent of the g
option is Select-String
's -AllMatches
switch, which looks for all matches on each input line (by default, it only looks for the first.
What Select-String
outputs aren't strings, i.e. not the matching lines directly, but wrapper objects of type [Microsoft.PowerShell.Commands.MatchInfo]
with metadata about each match.
- Instances of that type have a
.Matches
property that contains array of [System.Text.RegularExpressions.Match]
instances, whose .Value
property contains the text of each match (whereas the .Line
property contains the matching line in full).
To put it all together:
$capwords = Get-Content -Raw $path |
Select-String -CaseSensitive -AllMatches -Pattern '\b[A-Z]+\b' |
ForEach-Object { $_.Matches.Value }
Note the use of -Raw
with Get-Content
, which greatly speeds up processing, because the entire file content is read as a single, multi-line string - essentially, Select-String
then sees the entire content as a single "line". This optimization is possible, because you're not interested in line-by-line processing and only care about what the regex captured, across all lines.
As an aside:
$_.Matches.Value
takes advantage of PowerShell's member-access enumeration, which you can similarly leverage to avoid having to loop over the paragraphs in $paras
explicitly:
# Use member-access enumeration on collection $paras to get the .Range
# property values of all collection elements and access their .Text
# property value.
$paras.Range.Text | Out-File -FilePath $path
.NET API alternative:
The [regex]::Matches()
.NET method allows for a more concise - and better-performing - alternative:
$capwords = [regex]::Matches((Get-Content -Raw $path), '\b[A-Z]+\b').Value
Note that, in contrast with PowerShell, the .NET regex APIs are case-sensitive by default, so no opt-in is required.
.Value
again utilizes member-access enumeration in order to extract the matching text from all returned match-information objects.