0

I want to remove <script> tags from an html string using regex.

I have the following code which works, but doesn't work when you put back to back scripts:

function removeScriptsFromHtmlStr(html) {
  const regex = /<script(?:(?!\/\/)(?!\/\*)[^'"]|"(?:\\.|[^"\\])*"|'(?:\\.|[^'\\])*'|\/\/.(?:\n)|\/\*(?:(?:.|\s))*?\*\/)*?<\/script>/;
  const result = html.replace(regex, '');
  return result;
}

So for example:

running this through the funciton will work fine

<script>alert(document.cookie);</script>

but this won't:

<script>alert(document.cookie);</script><script>alert(document.cookie);</script>

How can I update the regex to fix this?

halfer
  • 19,824
  • 17
  • 99
  • 186
cup_of
  • 6,397
  • 9
  • 47
  • 94
  • 1
    add a `g` flag (stands for global). So `/your regex/g`, otherwise only the first match will be replaced – Robbie Milejczak Feb 18 '20 at 19:36
  • 1
    Does this answer your question? [Removing all script tags from html with JS Regular Expression](https://stackoverflow.com/questions/6659351/removing-all-script-tags-from-html-with-js-regular-expression) – Rapsssito Feb 18 '20 at 19:36
  • If this is done for security reasons, then don't do it with RegExp, it's far too easy to trick any RegExp you ever can create. Create a DocumentFragment instead, attach the HTML, and remove the script elements from the parsed fragment. – Teemu Feb 18 '20 at 19:43
  • 1
    Don't parse HTML with regex. [RegEx match open tags except XHTML self-contained tags](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – CAustin Feb 18 '20 at 19:45
  • @RobbieMilejczak dang i cant believe i missed that. seems to be working for me, thanks! if you want to post an answer ill accept it – cup_of Feb 18 '20 at 19:47
  • 1
    FYI there are plenty of ways to get break that reg exp. – epascarello Feb 18 '20 at 19:55
  • If you really end up to do the task using [RegExp](https://imgur.com/ynO1PcJ) ... – Teemu Feb 19 '20 at 05:59

3 Answers3

0

Since JavaScript does not support the Singleline /s flag you will need to do a workaround for .:

/<script.*?>[\s\S]*?<\/script>/gi

https://regex101.com/r/7iFLnA/1

MonkeyZeus
  • 20,375
  • 4
  • 36
  • 77
-1

Lazy loading should do the trick <script>(.+?)<\/script>

Stanton
  • 3
  • 2
-1

try this regex

/<script.*?>.*?<\/script>/igm

https://stackoverflow.com/a/15816902/5812095

Bilal Amin
  • 1
  • 1
  • 3