6

I'm trying to get the first block of releasenotes...
(See sample content in the code)

Whenever I use something simple it works,
it only breaks when I try to search across multiple lines (\n).
I'm using (Get-Content $changelog | Out-String) because that gives back 1 string instead of an array from each line.

$changelog = 'C:\Source\VSTS\AcmeLab\AcmeLab Core\changelog.md'
$regex = '([Vv][0-9]+\.[0-9]+\.[0-9]+\n)(^-.*$\n)+'

(Get-Content $changelog | Out-String) | Select-String -Pattern $regex -AllMatches

<#
SAMPLE:
------
v1.0.23
- Adds an IContainer API.
- Bugfixes.

v1.0.22
- Hotfix: Language operators.

v1.0.21
- Support duplicate query parameters.

v1.0.20
- Splitting up the ICommand interface.
- Fixing the referrer header empty field value.

#>

The result I need is:

v1.0.23
- Adds an IContainer API.
- Bugfixes.

Update:

Using options..

$changelog = 'C:\Source\VSTS\AcmeLab\AcmeLab Core\changelog.md'
$regex = '(?smi)([Vv][0-9]+\.[0-9]+\.[0-9]+\n)(^-.*$\n)+'

Get-Content -Path $changelog -Raw | Select-String -Pattern $regex -AllMatches

I also get nothing.. (no matter if I use \n or \r\n)

grmbl
  • 2,514
  • 4
  • 29
  • 54
  • At the first glance it seems you may need to consider carriage returns before `\n` i.e `\r\n` – revo Oct 03 '18 at 19:03
  • When I use '([Vv][0-9]+\.[0-9]+\.[0-9]+\r\n)' as pattern the console returns the entire contents of the file? Even when I only use '([Vv][0-9]+\.[0-9]+\.[0-9]+)'..? – grmbl Oct 03 '18 at 19:08

1 Answers1

10
  • Unless you're stuck with PowerShell v2, it's simpler and more efficient to use Get-Content -Raw to read an entire file as a single string; besides, Out-String adds an extra newline to the string.[1]
  • Since you're only looking for the first match, you can use the -match operator - no need for Select-String's -AllMatches switch.
    • Note: While you could use Select-String without it, it is more efficient to use the -match operator, given that you've read the entire file into memory already.
  • Regex matching is by default always case-insensitive in PowerShell, consistent with PowerShell's overall case-insensitivity.

Thus, the following returns the first block, if any:

if ((Get-Content -Raw $changelog) -match '(?m)^v\d+\.\d+\.\d+.*(\r?\n-\s?.*)+') { 
  # Match found - output it.
  $Matches[0] 
}

* (?m) turns on inline regex option m (multi-line), which causes anchors ^ and $ to match the beginning and end of individual lines rather than the overall string's.

  • \r?\n matches both CRLF and LF-only newlines.
  • You could make the regex slightly more efficient by making the (...) subexpression non-capturing, given that you're not interested in what it captured: (?:...).

Note that -match itself returns a Boolean (with a scalar LHS), but information about the match is recorded in the automatic $Matches hashtable variables, whose 0 entry contains the overall match.


As for what you tried:

'([Vv][0-9]+\.[0-9]+\.[0-9]+\n)(^-.*$\n)+'

doesn't work, because by default $ only matches at the very end of the input string, at the end of the last line (though possibly before a final newline). To make $ to match the end of each line, you'd have to turn on the multiline regex option (which you did in your 2nd attempt). As a result, nothing matches.

'(?smi)([Vv][0-9]+\.[0-9]+\.[0-9]+\n)(^-.*$\n)+'

doesn't work as intended, because by using option s (single-line) you've made . match newlines too, so that a greedy subexpression such as .* will match the remainder of the string, across lines. As a result, everything from the first block on matches.


[1] This problematic behavior is discussed in GitHub issue #14444.

mklement0
  • 382,024
  • 64
  • 607
  • 775