5

When a user create a message there is a multibox and this multibox is connected to a design panel which lets users change fonts, color, size etc.. When the message is submited the message will be displayed with html tags if the user have changed color, size etc on the font.

Note: I need the design panel, I know its possible to remove it but this is not the case :)

It's a Sharepoint standard, The only solution I have is to use javascript to strip these tags when it displayed. The user should only be able to insert links, images and add linebreaks.

Which means that all html tags should be stripped except <a></a>, <img> and <br> tags.

Its also important that the attributes inside the the <img> tag that wont be removed. It could be isplayed like this:

<img src="/image/Penguins.jpg" alt="Penguins.jpg" style="margin:5px;width:331px;">

How can I accomplish this with javascript?

I used to use this following codebehind C# code which worked perfectly but it would strip all html tags except <br> tag only.

public string Strip(string text)
{
   return Regex.Replace(text, @"<(?!br[\x20/>])[^<>]+>", string.Empty);
}

Any kind of help is appreciated alot

Obsivus
  • 8,231
  • 13
  • 52
  • 97
  • You should use a proper HTML sanitizer for this. Regex is not really suited for this task. BTW can the user freely type HTML? If that's so an `` tag is all you need to perform an effective XSS attack/cookie stealing. – Fabrício Matté Aug 08 '13 at 14:15
  • Bears repeating: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – musicnothing Aug 08 '13 at 14:17
  • @FabrícioMatté its a local project – Obsivus Aug 08 '13 at 14:19

4 Answers4

9

Does this do what you want? http://jsfiddle.net/smerny/r7vhd/

$("body").find("*").not("a,img,br").each(function() {
    $(this).replaceWith(this.innerHTML);
});

Basically select everything except a, img, br and replace them with their content.

Smern
  • 18,746
  • 21
  • 72
  • 90
  • Was curious on performance - http://jsperf.com/replacewith-vs-unwrap. Note that using a chain `find().not()` also generally affects performance positively as jQuery will have to convert and use this behind the scenes anyway (actually this seems to provide the biggest performance bonus). – Smern Aug 08 '13 at 14:42
  • eh.. looks like .not() only outperforms in Chrome.. in firefox/IE :not is around twice as fast http://jsperf.com/replacewith-vs-unwrap/2 – wirey00 Aug 12 '13 at 13:57
  • 1
    You shouldn't use `innerHTML` here because the loop won't recurse into any tags that you unroll this way. So, for example, for nested tags like `
    `, the `` will not be replaced. For this reason, I'd use `innerText` instead of `innerHTML` here.
    – BadgerPriest Jun 27 '14 at 00:10
  • @BadgerPriest, if I used `innerText` it would remove the img, a, and br tags within other tags which would not be good. A better alternative might be to recursively go through the children. – Smern Feb 04 '15 at 15:02
3

Smerny's answer is working well except that the HTML structure is like:

var s = '<div><div><a href="link">Link</a><span> Span</span><li></li></div></div>';
var $s = $(s);
$s.find("*").not("a,img,br").each(function() {
    $(this).replaceWith(this.innerHTML);
});
console.log($s.html());

The live code is here: http://jsfiddle.net/btvuut55/1/

This happens when there are more than two wrapper outside (two divs in the example above).

Because jQuery reaches the most outside div first, and its innerHTML, which contains span has been retained.

This answer $('#container').find('*:not(br,a,img)').contents().unwrap() fails to deal with tags with empty content.

A working solution is simple: loop from the most inner element towards outside:

var $elements = $s.find("*").not("a,img,br");
for (var i = $elements.length - 1; i >= 0; i--) {
    var e = $elements[i];
    $(e).replaceWith(e.innerHTML);
}

The working copy is: http://jsfiddle.net/btvuut55/3/

Joy
  • 9,430
  • 11
  • 44
  • 95
2

with jQuery you can find all the elements you don't want - then use unwrap to strip the tags

$('#container').find('*:not(br,a,img)').contents().unwrap()

FIDDLE

wirey00
  • 33,517
  • 7
  • 54
  • 65
0

I think it would be better to extract to good tags. It is easy to match a few tags than to remove the rest of the element and all html possibilities. Try something like this, I tested it and it works fine:

// the following regex matches the good tags with attrinutes an inner content
var ptt = new  RegExp("<(?:img|a|br){1}.*/?>(?:(?:.|\n)*</(?:img|a|br){1}>)?", "g");
var input = "<this string would contain the html input to clean>";              
var result = "";

var match = ptt.exec(input);                
while (match) {
    result += match;
    match = ptt.exec(input);
}

// result will contain the clean HTML with only the good tags
console.log(result);
  • Regex is generally not the way to go about parsing DOM. Also, your example doesn't seem to be working quite right: http://jsfiddle.net/smerny/r7vhd/1/, it's leaving an open `
    ` and removing a `blah` within `
    blah
    ` in this example.
    – Smern Aug 08 '13 at 17:47
  • No no, I would never process DOM elements with string manipulation, not after jQuery anyways. That was in case the input was received as a pure string... but, you're right, the regex has bugs, i'll fix them and edit the post – Fernando Gomez Aug 08 '13 at 18:02
  • Okay, but you can convert a string to a jquery/dom object like this `$('')` and then you can use jquery functions on it just like if it was pulled from the document. – Smern Aug 08 '13 at 18:13
  • Yeah, but if you don't need the DOM elements it would be pointless to create jQuery objects, unless you will eventually insert them in the DOM. If the procedure is from sting to string using regex is way faster.. that is, a good regex, mine has bugs – Fernando Gomez Aug 08 '13 at 18:22