Note:
I'm assuming you're looking for the whole line on which BOURKE
appears as a substring.
In your own attempts, (?<BOURKE>...)
simply gives the regex capture group a self-chosen name (BOURKE
), which is unrelated to what the capture group's subexpression (...
) actually matches.
For the use case at hand, there's no strict need to use a (named) capture group at all, so the solutions below make do without one, which, when the -match
operator is used, means that the result of a successful match is reported in index [0]
of the automatic $Matches
variable, as shown below.
If your multiline input string contains only Unix-format LF newlines (\n
), use the following:
if ($multiLineStr -match '.*BOURKE.*') { $Matches[0] }
Note:
- To match case-sensitively, use
-cmatch
instead of -match
.
- If you know that the substring is preceded / followed by at least one char., use
.+
instead of .*
- If you want to search for the substring verbatim and it happens to or may contain regex metacharacters (e.g.
.
), apply [regex]::Escape()
to it; e.g, [regex]::Escape('file.txt')
yields file\.txt
(\
-escaped metacharacters).
- If necessary, add additional constraints for disambiguation, such as requiring that the substring start or end only at word boundaries (
\b
)
If there's a chance that Windows-format CLRF newlines (\r\n
) are present , use:
if ($multiLineStr -match '.*BOURKE[^\r\n]*') { $Matches[0] }
For an explanation of the regexes and the ability to experiment with them, see this regex101.com page for .*BOURKE.*
, and this one for .*BOURKE[^\r\n]*
In short:
- By default,
.
matches any character except \n
, which obviates the need for line-specific anchors (^
and $
) altogether, but with CRLF newlines requires excluding \r
so as not to capture it as part of the match.[1]
Two asides:
PowerShell's -match
operator only ever looks for one match; if you need to find all matches, you currently need to use the underlying [regex]
API directly; e.g., [regex]::Matches($multiLineStr, '.*BOURKE[^\r\n]*').Value, 'IgnoreCase'
GitHub issue #7867 suggests bringing this functionality directly to PowerShell in the form of a -matchall
operator.
If you want to anchor the substring to find, i.e. if you want to stipulate that it either occur at the start or at the end of a line, you need to switch to multi-line mode ((?m)
), which makes ^
and $
match on each line; e.g., to only match if BOURKE
occurs at the very start of a line:
if ($multiLineStr -match '(?m)^BOURKE[^\r\n]*') { $Matches[0] }
If line-by-line processing is an option:
Line-by-line processing has the advantage that you needn't worry about differences in newline formats (assuming the utility handling the splitting into lines can handle both newline formats, which is true of PowerShell in general).
If you're reading the input text from a file, the Select-String
cmdlet, whose very purpose is to find the whole lines on which a given regex or literal substring (-SimpleMatch
) matches, additionally offers streaming processing, i.e. it reads lines one by one, without the need to read the whole file into memory.
(Select-String -LiteralPath file.txt -Pattern BOURKE).Line
Add -CaseSensitive
for case-sensitive matching.
The following example simulates the above (-split '\r?\n'
splits the multiline input string into individual lines, recognizing either newline format):
(
@'
initial text
preliminary text
unfinished line bfore the line I want
001 BOURKE, Bridget Mary ....... ........... 13 Mahina Road, Mahina Bay.Producrs/As 002 BOURKE. David Gerard ...
line after the line I want
extra text
extra extra text
'@ -split '\r?\n' |
Select-String -Pattern BOURKE
).Line
Output:
001 BOURKE, Bridget Mary ....... ........... 13 Mahina Road, Mahina Bay.Producrs/As 002 BOURKE. David Gerard ...
[1] Strictly speaking, the [^\r\n]*
would also stop matching at a \r
character in isolation (i.e., even if not directly followed by \n
). If ruling out that case is important (which seems unlikely), use a (simplified version of) the regex suggested by Mathias R. Jessen in a comment on the question: .*BOURKE.*?(?=\r?\n)