2

I have a file which looks like this

ABC01|01
Random data here 2131233154542542542
More random data
STRING-C
A bit more random stuff
&(%+
ABC02|01
Random data here 88888888
More random data 22222
STRING-D
A bit more random stuff
&(%+

I'm trying to make a script to Find everything between ABC01 and &(%+ ONLY if it contains STRING-C

I came up with this for regex ABC([\s\S]*?)STRING-C(?s)(.*?)&\(%\+

I'm getting this content from a text file with get-content.

$bad_content = gc $bad_file -raw

I want to do something like ($bad_content.replace($pattern,"") to remove the regex match.

How can I replace my matches in the file with nothing? I'm not even sure if my regex is correct but on regex101 it seems to find the strings I'm needing.

shadow2020
  • 1,315
  • 1
  • 8
  • 30

2 Answers2

3

We can use a tempered dot trick when matching between the two markers to ensure that we don't cross the ending marker before matching STRING-C:

ABC01(?:(?!&\(%\+)[\s\S])*?STRING-C[\s\S]*?&\(%\+

Demo

Here is an explanation of the regex pattern:

ABC01                   match the starting marker
(?:(?!&\(%\+)[\s\S])*?  without crossing the ending marker
STRING-C                match the nearest STRING-C marker
[\s\S]*?                then match all content, across lines, until reaching
&\(%\+                  the ending marker
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • 1
    Love the solution but I'm getting the same result as my own pattern. If I do this, `[regex]::match($pattern, $exampledata)` I get a value of STRING-C and nothing else. I need the entire text to be replaced and not just the string-c – shadow2020 Jul 21 '21 at 16:29
  • 3
    If you check the regex demo link, you'll see that in fact my pattern matches the entire block from start to end marker. There must be some other with Powershell (with which sadly I have zero experience using). – Tim Biegeleisen Jul 21 '21 at 16:30
  • 1
    @shadow2020 Tim's regex works perfectly fine, you should be doing `[regex]::Matches($exampledata, $pattern)` not vice-versa. – Santiago Squarzon Jul 21 '21 at 16:39
3

Your regex works with the sample input given, but not robustly, because if the order of blocks were reversed, it would mistakenly match across the blocks and remove both.

Tim Biegeleisen's helpful answer shows a regex that fixes the problem, via a negative lookahead assertion ((?!...)).

Let me show how to make it work from PowerShell:

  • You need to use the regex-based -replace operator, not the literal-substring-based .Replace() method:[1] to apply it.

  • To read the input string from a file, use Get-Content's -Raw switch to ensure that the file is read as a single, multi-line string; by default, Get-Content returns an array (stream) of lines, which would cause the -replace operation to be applied to each line individually.

(Get-Content -Raw file.txt) -replace '(?s)ABC01(?:(?!&\(%\+).)*?STRING-C.*?&\(%\+'

Not specifying replacement text (as the optional 2nd RHS operand to -replace) replaces the match with the empty string and therefore effectively removes what was matched.

The regex borrowed from Tim's answer is simplified a bit, by using the inline method of specifying matching options to tun on the single-line option ((?s)) at the start of the expression, which makes subsequent . instances match newlines too (a shorter and more efficient alternative to [\s\S]).


[1] See this answer for the juxtaposition of the two, including guidance on when to use which.

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • Glad to hear it, @shadow2020. Kudos to [Tim Biegeleisen](https://stackoverflow.com/users/1863229/tim-biegeleisen) for coming up with the robust regex. – mklement0 Jul 21 '21 at 17:44