2

I am new to Powershell, so please understand.

I have this pattern

(.*?)(\d{3})(.*?:\r?\n)(?!\2)(\d{3})

to match this text:

111 is different from:
111 is different from:
123 is different from:
567.

This only gives 1 match, whereas there are 2 instances there. How can that be achieved? The pattern consumes 123 in the first instance so that it can't be found. I had to repeat the line several times to overcome this. I believe there are other ways. Please help.

Tried to change the 123 pattern into lookahead. But I couldn't capture the 123.

Goal: I want to insert a line, a sentence, between the two different values.

EDIT: like this

111 is different from:
111 is different from:
   *** This value 123 ***
123 is different from:
  *** This value 567 ***
567. 
mklement0
  • 382,024
  • 64
  • 607
  • 775
Eddie
  • 83
  • 6
  • Your pattern does not seem to match anything, see https://regex101.com/r/CO7siv/1 In the example data, group 1 is always empty as the string starts with 3 digits. The negative lookahead after the newline will not succeed. Can you update the question with an example of the desired result? – The fourth bird Jan 14 '23 at 09:36
  • The character ^ indicates beginning of line and $ indicates end of line. You do not put the return characters into a Regex. So use "^\d{3}.*:$" – jdweng Jan 14 '23 at 10:32
  • @The fourth bird, sorry.. The back reference should be \2... It works OK in Regex Storm tester. But matches only one instance. As 123 was already consumed in the first instance. – Eddie Jan 14 '23 at 12:43
  • @jdweng, I work with a single line, and am not sure about putting ^ and $. Usually the regex works fine. – Eddie Jan 14 '23 at 12:45
  • 3
    Please provide a better example and also show us the desired output because now it is unclear what you are trying to achieve – Theo Jan 14 '23 at 13:10
  • Trust me. I've been working with Regex for a long time. – jdweng Jan 14 '23 at 15:05
  • 1
    @Eddie Do you mean like this? `^(?=(\d{3})(.*:\r?\n)(?!\1)(\d{3}))` See the 3 capture group values https://regex101.com/r/N0rCk2/1 – The fourth bird Jan 14 '23 at 15:30
  • 1
    To answer the question, the -replace (regex) operator always a global replace, whether it's a string array or a raw string with line endings. – js2010 Jan 14 '23 at 16:36
  • @js2010, yes, I understand that. But the replacement is always forward. The position is always at the end of the pattern consumed. And in this case it consumes the start part of the next replace. – Eddie Jan 14 '23 at 19:35
  • Please edit the question and add a reproducible problem. – js2010 Jan 14 '23 at 19:39
  • Edited. With an example. Thanks. – Eddie Jan 14 '23 at 19:44
  • @The fourth bird... Yes. That works. I thought all groups in lookahead is not capturing ! Important lesson for me... Thanks so much . – Eddie Jan 14 '23 at 20:14
  • I choose this as the answer. – Eddie Jan 14 '23 at 20:21
  • @Eddie Did you mean the suggestion I added in the comments? That will not give you the desired replacement. In that way you can use `^(\d{3})\b.*:(?=\r?\n(?!\1)(\d{3})\b)` See https://regex101.com/r/wyG5AD/1 – The fourth bird Jan 14 '23 at 20:34

2 Answers2

2

Note that PowerShell's -replace operator is invariably global, i.e. it always looks for and replaces all matches of the given regex.

Use the following -replace operation instead:

@'
111 is different from:
111 is different from:
123 is different from:
567.
'@ -replace '(?m)^(\d{3}) .+:(\r?\n)(?!\1)(?=(\d{3})\b)', 
            '$0  *** This value $3 ***$2'

Note: The @'<newline>...<newline>'@ string literal used for the multiline input text is a so-called here-string.

Output:

111 is different from:
111 is different from:
  *** This value 123 ***
123 is different from:
  *** This value 567 ***
567.
  • For a detailed explanation of the regex and the ability to experiment with it, see this regex101.com page, but in short:

    • (?m) is the inline form of the Multiline .NET regex option, which makes ^ and $ match at the start and end of each line.

    • ^(\d{3}) therefore matches a 3-digit sequence only at the start of a line, in a capture group, and .+: matches a space and at least one additional character on the same line all the way to a : at the end.

    • (\r?\n) captures the specific newline sequence encountered (which may be CRLF (Windows-format) or just LF (Unix-format)) in a 2nd capture group.

      • Capturing the specific newline sequence allows you to replicate it in the substitution string via placeholder $2, to ensure that the newly inserted line is terminated with the same sequence.

      • If you don't care about potentially mixing \r\n and \n in the resulting string, you could omit the 2nd capture group and use "`n" (sic) or "`r`n" instead, using an expandable string ("...") with an escape sequence - note that, unlike in C#, \r and \n are not recognized in PowerShell string literals (it is only the .NET regex engine that recognizes them, but not in the substitution operand of -replace, which is not a regex, and where only $-prefixed placeholders are recognized).

        # Conceptually cleaner: separate the verbatim part from
        # the expandable part.
        ('$0  *** This value $2 ***' + "`n")
        
        # Alternative, using a single "..." string
        # The '$' chars. that are part of -replace *placeholders*
        # must be *escaped as `$* to prevent up-front expansion by PowerShell
        "`$0  *** This value `$2 ***`n"
        
    • (?!\1)(?=(\d{3})\b) uses both a negative ((?!...)) and positive (?=...) lookahead assertion to look for 3 digits at the start of the next line (at a word boundary, due to \b) that aren't the same as the 3 digits on the current line (\1 being a backreference to what the 1st capture group matched).

      • Note that using a capture group inside an overall by-definition non-capturing lookaround assertion is possible, and indeed used above to capture the 3-digit sequence at the start of the subsequent line, referenced via placeholder $3 in the substitution string.
    • In the substitution string, $0, $2 and $3 refer to the what the entire regex, the 2nd capture group, and the 3rd one captured, respectively ($& may be used in lieu of $0; see this answer for more info about these placeholders).

      • Note that by using a string as the substitution operand, you are limited to embedding captured text as-is, via placeholders as such as $0 (see this answer for more info about these placeholders). If you need to determine the substitution text fully dynamically, i.e. if it needs to apply transformations based on each match:

        • In PowerShell (Core) 7+, you can use a script block { ... } instead.

        • In Windows PowerShell, you'll have to call the underlying [regex]::Replace() method directly.

      • See below.


To spell out the fully dynamic substitution approach, adding 1 to the captured number in this example:

PowerShell (Core) 7+ solution, using a script block ({ ... }) as -replace's substitution operand:

@'
111 is different from:
111 is different from:
123 is different from:
567.
'@ -replace '(?m)^(\d{3}) .+:(\r?\n)(?!\1)(?=(\d{3})\b)', {
               '{0}  *** This value + 1: {1} ***{2}' -f $_.Value, ([int] $_.Groups[3].Value + 1), $_.Groups[2].Value
            }

Windows PowerShell solution, where a direct call to the underlying [regex]::Replace() method is required:

$str = @'
111 is different from:
111 is different from:
123 is different from:
567.
'@

[regex]::Replace(
  $str, 
  '(?m)^(\d{3}) .+:(\r?\n)(?!\1)(?=(\d{3})\b)', 
  {
    param($m)
    '{0}  *** This value + 1: {1} ***{2}' -f $m.Value, ([int] $m.Groups[3].Value + 1), $m.Groups[2].Value
  }
)

Output (note that 1 has been added to each captured value):

111 is different from:
111 is different from:
  *** This value + 1: 124 ***
123 is different from:
  *** This value + 1: 568 ***
567.
mklement0
  • 382,024
  • 64
  • 607
  • 775
1

You can use 2 capture groups where you can use the first group in a negative lookahead, and the second group to get the right result after replacing.

^(\d{3})\b.*:(?=\r?\n(?!\1)(\d{3})\b)

In the replacement use the full match and group 2:

$0\n   *** This value $2 ***

See a .NET regex101 demo.

Output

111 is different from:
111 is different from:
   *** This value 123 ***
123 is different from:
   *** This value 567 ***
567.

If you want the position at the start of the string that asserts that the next line does not start with the digits at the start of the first line, the whole pattern will be in a positive lookahead assertion:

^(?=(\d{3}\b)(.*:\r?\n)(?!\1)(\d{3})\b)

See another .NET regex101 demo.

The fourth bird
  • 154,723
  • 16
  • 55
  • 70