-1

My input is as follows

input = "hello <script>alert("I am stealing your data");</script>"

I want to remove the complete script tag from the string and the output should look like

output = "hello"

Tried following command but its not removing complete tag.

input.replace(/(<([^>]+)>)/ig, ''));

It gives result us

"hello alert("I am stealing you data");"
Can Can
  • 3,644
  • 5
  • 32
  • 56
  • @NinaScholz what is missing? It's a simple straight forward question – Can Can May 16 '19 at 12:20
  • didn't dv. But if the non "hello" part is constant than you could use the following [answer](https://stackoverflow.com/questions/8529070/remove-portion-of-string-in-javascript) – Dixel May 16 '19 at 12:22
  • @Dixel given that it's a `script` tag, I wouldn't assume it's constant. This appears to be aimed at sanitising arbitrary data. – VLAZ May 16 '19 at 12:23
  • Its not constant. Its the input we get from user. Kind of fixing XSS attack – Can Can May 16 '19 at 12:23
  • Hmm ... If you're getting this string from your server, the attack has already took a place. – Teemu May 16 '19 at 12:30
  • @Teemu Just testing with that input :D – Can Can May 16 '19 at 12:32

2 Answers2

5

You should not use regular expressions for this. Instead use the DOM parser capabilities:

var input = 'hello <script\>alert("I am stealing your data");</script\>';

var span = document.createElement("span");
span.innerHTML = input; // This will not execute scripts
// Remove all script tags within this span element:
Array.from(span.querySelectorAll("script"), script => script.remove()); 
// Get the remaining HTML out of it
var scriptless = span.innerHTML;

console.log(scriptless);

Just note that it is a very bad idea to let the user pass arbitrary HTML to your application. Sanitising involves a lot more than just removing script tags.

trincot
  • 317,000
  • 35
  • 244
  • 286
  • *"Sanitising involves a lot more than just removing script tags."* That depends how you want to sanitize your string. A simple two liner would sanitize an html string just fine. `var replacements = { "&": "&", "<": "<", ">": ">", '"': """, "'": "'" }; str.replace(/[&<>"']/g, match => replacements[match]);` – 3limin4t0r May 16 '19 at 13:00
1

You do not need to use a regular expression, because those can be easy to trick and are not fit for parsing HTML content, especially not untrusted HTML content.

Instead, you can use a DOMParser to create a new document and use the DOM API to find and remove all script tags, then return the rest of the content:

function sanitise(input) {
  const parser = new DOMParser();
  const doc = parser.parseFromString(input, "text/html");
  
  //find all script tags
  const scripts = doc.getElementsByTagName('script');

  for (const script of scripts)
    script.remove(); //remove from the DOM
  
  return doc.body.textContent.trim();
}

//using the + because otherwise Stack Snippets breaks
console.log(sanitise("hello <script>alert('I am stealing your data');</scr"+"ipt>"))
VLAZ
  • 26,331
  • 9
  • 49
  • 67