You mention your source is an html mix, I'll offer this variation
that removes any complications html tags may introduce.
With the addition of the atomic group and the \G
anchor
there is little risk of stack overflow.
Replace with $1<%--$2--%>
Raw Regex:
\G((?><(?:script(?:\s+(?:"[\S\s]*?"|'[\S\s]*?'|(?:(?!/>)[^>])*?)+)?\s*>[\S\s]*?</script\s*|(?:/?[\w:]+\s*/?)|(?:[\w:]+\s+(?:(?:(?:"[\S\s]*?")|(?:'[\S\s]*?'))|(?:[^>]*?))+\s*/?)|\?[\S\s]*?\?|(?:!(?:(?:DOCTYPE[\S\s]*?)|(?:\[CDATA\[[\S\s]*?\]\])|(?:ATTLIST[\S\s]*?)|(?:ENTITY[\S\s]*?)|(?:ELEMENT[\S\s]*?)))|%--[\S\s]*?--%)>|(?!<!--[\S\s]*?-->)[\S\s])*)<!--([\S\s]*?)-->
Stringed Regex:
"\\G((?><(?:script(?:\\s+(?:\"[\\S\\s]*?\"|'[\\S\\s]*?'|(?:(?!/>)[^>])*?)+)?\\s*>[\\S\\s]*?</script\\s*|(?:/?[\\w:]+\\s*/?)|(?:[\\w:]+\\s+(?:(?:(?:\"[\\S\\s]*?\")|(?:'[\\S\\s]*?'))|(?:[^>]*?))+\\s*/?)|\\?[\\S\\s]*?\\?|(?:!(?:(?:DOCTYPE[\\S\\s]*?)|(?:\\[CDATA\\[[\\S\\s]*?\\]\\])|(?:ATTLIST[\\S\\s]*?)|(?:ENTITY[\\S\\s]*?)|(?:ELEMENT[\\S\\s]*?)))|%--[\\S\\s]*?--%)>|(?!<!--[\\S\\s]*?-->)[\\S\\s])*)<!--([\\S\\s]*?)-->"
Expanded/Formatted:
\G # G anchor
( # (1 start)
(?> # Atomic group start
< # Begin a Tag <, but not an html comment
(?:
script # Script
(?:
\s+
(?:
" [\S\s]*? "
| ' [\S\s]*? '
| (?:
(?! /> )
[^>]
)*?
)+
)?
\s* >
[\S\s]*? </script \s*
| # or,
(?: # Non-attribute
/?
[\w:]+
\s*
/?
)
| # or,
(?: # Attribute
[\w:]+
\s+
(?:
(?:
(?: " [\S\s]*? " )
| (?: ' [\S\s]*? ' )
)
| (?: [^>]*? )
)+
\s*
/?
)
| # or,
\? # <? ?> form
[\S\s]*?
\?
| # or,
(?: # Misc <! > forms
!
(?:
(?:
DOCTYPE
[\S\s]*?
)
| (?:
\[CDATA\[
[\S\s]*?
\]\]
)
| (?:
ATTLIST
[\S\s]*?
)
| (?:
ENTITY
[\S\s]*?
)
| (?:
ELEMENT
[\S\s]*?
)
)
)
| # or,
%-- [\S\s]*? --% # JSP comment
)
> # End a Tag >
| # or,
# A character that does
# not begin a html comment
(?! <!-- [\S\s]*? --> )
[\S\s]
)* # Atomic group end, 0 to many times
) # (1 end)
<!--
( [\S\s]*? ) # (2), Finally, the Html comment
-->