1

got a html which contains 2 lines of texts.

<!-- START -->
asdf
<!-- END -->

between those 2 marker can stand anything and its changing data so its not same data all the time. Is there a possibility to erase all lines between those 2?

Have tried with regex

(?sm)<!-- START -->.*?(?=^<!-- END -->)

but he always starts with the first line and not below.

Can someone help me to start after with regex and then delete it?

s0Nic
  • 97
  • 6
  • 2
    Use a parser that understands HTML. A [regex doesn't](https://stackoverflow.com/q/6751105/503046) work with HTML. Try, say, `html agility pack`. – vonPryz Aug 14 '20 at 08:26
  • But hes stopping right only the beginning is wrong – s0Nic Aug 14 '20 at 08:28
  • It will not select the second line due to the lookahead `(?=^)` You could try a capturing group and use the group in the replacement `(?sm)\r?\n(.*?)\r?\n` https://regex101.com/r/CGun4i/1 but html and regex is usually not a good combination. – The fourth bird Aug 14 '20 at 08:35
  • yes done that its working `$regex=@' (?ms)^(\s*\s*?\r?\n).*?\r?\n(\s*\s*) '@ $delete = (Get-Content -raw $file) -replace $regex, '$1$2' $delete |Set-Content C:\Users\marku\Desktop\GEA\Powershell\mdi-opc-items.html` – s0Nic Aug 14 '20 at 08:46
  • 1
    @s0Nic Ah yes, I suggested it the other way around :-) Wiktor Stribiżew provided the right answer with the explanation. – The fourth bird Aug 14 '20 at 14:45

1 Answers1

1

The main issue here is that you match without capturing the left-hand delimiter.

To match and erase arbitrary content in between two multichar delimiters you need to either put both delimiters inside lookarounds:

-replace '(?<=left_hand_delim).*?(?=right_hand_delim)'

Or, use capturing groups in the regex and backreferences in the replacement:

-replace '(left_hand_delim).*?(right_hand_delim)', '$1$2'

You may use

$regex='(?ms)(?<=^\s*<!-- OPC-ITEM-ENTRIES START -->\s*).*?(?=\s*<!-- OPC-ITEM-ENTRIES END -->)'
(Get-Content -raw $file) -replace $regex, '$1$2' | Set-Content $outfile

See regex demo 1 and regex demo #2 (see Context tab).

You must use -raw option to read in the file contents into a single variable since you need the s singleline flag to let . match any char including newlines.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563