2

I finnally got the JsHtmlSanitizer working as a standalone clientside script. Now I'd like to remove all HTML-Tags from a string and not just script-tags and links. This example

html_sanitize('<b>hello</b><img src="http://google.com"><a href="javascript:alert(0)"><script src="http://www.google.com"><\/script>');

returns "hello" but I'd like to remove all tags.

webnoob
  • 15,747
  • 13
  • 83
  • 165
John Doe
  • 173
  • 10

2 Answers2

0

Why not use regular expressions to remove all HTML tags after sanitizing?

var input = '<b>hello</b><img src="http://google.com"><a href="javascript:alert(0)"><script src="http://www.google.com"></script>';
var output = null;
output = html_sanitize(input);
output = output.replace(/<[^>]+>/g, '');

This should strip your input string of all html tags after sanitization.

If you want to do just basic sanitization (removing script and style tags with their content and all html tags only) you could implement the entire thing within regex. I have demonstrated an example below.

var input = '<b>hello</b><img src="http://google.com"><a href="javascript:alert(0)"><script src="http://www.google.com"></script>';
input += '<script> if (1 < 2) { alert("This script should be removed!"); } </script><style type="text/css">.cssSelectorShouldBeRemoved > .includingThis { background-color: #FF0000; } </style>';

var output = null;
output = input.replace(/(?:<(?:script|style)[^>]*>[\s\S]+?<\/(?:script|style)[^>]*>)|<[^>]+>/ig, '');
Tanzeel Kazi
  • 3,797
  • 1
  • 17
  • 22
  • 1
    Don't you know that there's a bot that crawls SO for answers containing "regular expressions" and "HTML", and auto-posts this link as a comment: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – user123444555621 Dec 28 '12 at 08:01
0

Use this javascript function below to remove all html tags from the string you get from html_sanitize().

var output = html_sanitize('<b>hello</b><img src="http://google.com"><a href="javascript:alert(0)"><script src="http://www.google.com"><\/script>');

output = output.replace(/(<.*?>)/ig,"");

Hope it helps :)

Avishek
  • 1,896
  • 14
  • 33