3

I'm trying to write a function that checks a parameter against an array of special HTML entities (like the user entered '&amp' instead of '&'), and then add a span around those entered entities.

How would I search through the string parameter to find this? Would it be a regex?

This is my code thus far:

 function ampersandKiller(input) {
 var specialCharacters = ['&', ' ']
 if($(specialCharacters).contains('&')) {
     alert('hey')
 } else {
     alert('nay')
 }
}

Obviously this doesn't work. Does anyone have any ideas?

So if a string like My name is &amp; was passed, it would render My name is <span>&amp;</span>. If a special character was listed twice -- like 'I really like &amp;&amp;&amp; it would just render the span around each element. The user must also be able to use the plain &.

Sergiu Dumitriu
  • 11,455
  • 3
  • 39
  • 62
streetlight
  • 5,968
  • 13
  • 62
  • 101
  • Could you provide simple test cases? like which string should match and which shouldn't. – Shiplu Mokaddim Dec 07 '12 at 12:40
  • Just added. Thank you for the tip! – streetlight Dec 07 '12 at 12:43
  • What are you *actually* trying to do? It seems to me that you are trying to solve a problem that you should not even have in the first place. Can you explain that problem? – Tomalak Dec 07 '12 at 12:53
  • I'm working on an application based on user input. I'm trying to allow the user to pass special characters and HTML entities -- but to stop the entity from rendering, I'm looking to wrap it in an tag before it comes back from the server. Maybe this isn't the best method of attack? – streetlight Dec 07 '12 at 12:55
  • Hm... I think this might turn out as a really nasty can of worms. So you have a ` – Tomalak Dec 07 '12 at 13:03
  • That's what I'm trying to avoid -- the html entity rendering. They should be able to enter & & and have it render exact the same without rendering. I thought using the tag may be the best method. – streetlight Dec 07 '12 at 13:05
  • An HTML entity like `&` is merely a "transport encoding". It still *means* `&`. If you want it to display as `&` in a browser, it would have to be HTML-encoded (at which point it turns it into `&amp;`). So if verbatim output is what you want, HTML encoding the input is what you must do. In any case you should not mess with the input via string replace or regex. – Tomalak Dec 07 '12 at 13:12
  • @adeneo No, that is wrong. The contents of `
    ` tags must still be HTML encoded. The only difference between `
    ` and other tags is that the line breaks are rendered as if they were `
    ` tags.
    – Tomalak Dec 07 '12 at 13:17
  • @Tomalak - I agree, it's really, really wrong! My answer below is just sooo much better ? – adeneo Dec 07 '12 at 13:26
  • @adeneo I think the OP simply writes the user's input to an HTML page but does not use HTML encoding. So the `&` and all the other entities are "lost" when the browser renders it. To apply proper HTML encoding would be the right thing to do. Any kind of string search-and-replace would be the wrong thing to do. – Tomalak Dec 07 '12 at 13:45
  • @Tomalak, what is the best way to HTML encode on input? – streetlight Dec 07 '12 at 13:53
  • 1
    If you do it on the server side, there is an HTML encode function in every programming language. If you're trying to do it on the client side, assigning a string to the text value of an element instead of using innerHTML would be correct. It depends on what you do. – Tomalak Dec 07 '12 at 14:13
  • In PHP, [`htmlspecialchars()`](http://php.net/manual/en/function.htmlspecialchars.php) function can be used to encode special characters in HTML code on server side. – Marat Tanalin Dec 16 '12 at 14:25

3 Answers3

3
function htmlEntityChecker(input) {
    var characterArray = ['&amp;', '&nbsp;'];
    $.each(characterArray, function(idx, ent) {
        if (input.indexOf(ent) != -1) {
            var re = new RegExp(ent, "g");
            input = input.replace(re, '<span>' + ent + '</span>');
        }
    });

    return input;
}

FIDDLE

adeneo
  • 312,895
  • 29
  • 395
  • 388
  • How would he wrap the ent in a span? – CR41G14 Dec 07 '12 at 12:43
  • This is great, and exactly what I asked for! However @Jack 's solution did the same thing but more concisely. I will keep this for future reference though as it's beautifully written. – streetlight Dec 07 '12 at 13:37
2

You could use this regular expression to find and wrap the entities:

input.replace(/&amp;|&nbsp;/g, '<span>$&</span>')

For any kind of entity, you could use this too:

input.replace(/&(?:[a-z]+|#\d+);/g, '<span>$&</span>');

It matches the "word" entities as well as numeric entities. For example:

'test & &amp; &#60;'.replace(/&(?:[a-z]+|#x?\d+);/gi, '<span>$&</span>');

Output:

test & <span>&amp;</span> <span>&#60;</span>
Ja͢ck
  • 170,779
  • 38
  • 263
  • 309
  • The array will eventually (and hopefully) contain all the major entities, not just these two. Would it be possible to use replace to do that? – streetlight Dec 07 '12 at 12:40
  • 1
    @streetlight I've added an expression that would match any entity, even numeric ones. – Ja͢ck Dec 07 '12 at 13:35
  • How would it be possible to refactor this into an if-else statement? So, for example, if an input had any characters found through this regrex, then do something? – streetlight Dec 07 '12 at 15:40
0

Another option would be to make the browser do a decode for you and check if the length is any different... check this question to see how to unescape the entities. You can then compare the length of the original string with the length of the decoded. Example below:

function htmlDecode(input){
    var e = document.createElement('div');
    e.innerHTML = input;
    return e.childNodes.length === 0 ? "" : e.childNodes[0].nodeValue;
}
function hasEntities(input) {
    if (input.length != htmlDecode(input).length) {
       return true;
    }
    return false;
}
alert(hasEntities('a'))
alert(hasEntities('&amp;'))

The above will show two alerts. First false and then true.

Community
  • 1
  • 1
kahowell
  • 27,286
  • 1
  • 14
  • 9