I have a file that has the following below, I am trying to remove everything from <!--
to -->
<!--<br>
/* Font Definitions */
-->
Only keep this part
I have a file that has the following below, I am trying to remove everything from <!--
to -->
<!--<br>
/* Font Definitions */
-->
Only keep this part
Don't use a regex. HTML isn't a regular language, so it can't be properly parsed with a regex. It will succeed most of the time, but other times will fail. Spectacularly.
I recommend cracking open the file, and reading it a character at at time, looking for the characters <
, !
, -
, followed by -
. Then, continue reading until you find -
, -
, !
, followed by >
.
$chars = [IO.File]::ReadAllText( $path ).ToCharArray()
$newFileContent = New-Object 'Text.StringBuilder'
for( $i = 0; $i -lt $chars.Length; ++$i )
{
if( $inComment )
{
if( $chars[$i] -eq '-' -and $chars[$i+1] -eq '-' -and $chars[$i+2] -eq '!' -and $chars[$i+3] -eq '>' )
{
$inComment = $false
$i += 4
}
continue
}
if( $chars[$i] -eq '<' -and $chars[$i+1] -eq '!' -and $chars[$i+2] -eq '-' -and $chars[$i+3] -eq '-' )
{
$inComment = $true
$i += 4
continue
}
$newFileContent.Append( $chars[$i] )
}
$newFileContent.ToString() | Set-Content -Path $path
Regular expressions to the rescue again -
@'
<!--<br>
/* Font Definitions */
-->
Only keep this part
'@ -replace '(?s)<!--(.+?)-->', ''
(?s)
makes dot match new lines :)