10

I spent some time looking best way to escape html string and found some discussions on that: discussion 1 discussion 2. It leads me to replaceAll function. Then I did performance tests and tried to find solution achieving similar speed with no success :(

Here is my final test case set. I found it on net and expand with my tries (4 cases at bottom) and still can not reach replaceAll() performance.

What is secret witch makes replaceAll() solution so speedy?

Greets!

Code snippets:

String.prototype.replaceAll = function(str1, str2, ignore) 
{
   return this.replace(new RegExp(str1.replace(/([\/\,\!\\\^\$\{\}\[\]\(\)\.\*\+\?\|\<\>\-\&])/g,"\\$&"),(ignore?"gi":"g")),(typeof(str2)=="string")?str2.replace(/\$/g,"$$$$"):str2);
};

credits for qwerty

Fastest case so far:

html.replaceAll('&', '&amp;').replaceAll('"', '&quot;').replaceAll("'", '&#39;').replaceAll('<', '&lt;').replaceAll('>', '&gt;');
Community
  • 1
  • 1
Saram
  • 1,500
  • 1
  • 18
  • 35

3 Answers3

4

Finally i found it! Thanks Jack for pointing me on jsperf specific

I should note that the test results are strange; when .replaceAll() is defined inside Benchmark.prototype.setup it runs twice as fast compared to when it's defined globally (i.e. inside a tag). I'm still not sure why that is, but it definitely must be related to how jsperf itself works.

The answer is:

replaceAll - this reach jsperf limit/bug, caused by special sequence "\\$&", so results was wrong.

compile() - when called with no argument it changes regexp definition to /(?:). I dont know if it is bug or something, but performance result was crappy after it was called.

Here is my result safe tests.

Finally I prepared proper test cases.

The result is, that for HTML escape best way it to use native DOM based solution, like:

document.createElement('div').appendChild(document.createTextNode(html)).parentNode.innerHTML

or if you repeat it many times you can do it with once prepared variables:

//prepare variables
var DOMtext = document.createTextNode("test");
var DOMnative = document.createElement("span");
DOMnative.appendChild(DOMtext);

//main work for each case
function HTMLescape(html){
  DOMtext.nodeValue = html;
  return DOMnative.innerHTML
}

Thank you all for collaboration & posting comments and directions.

jsperf bug description

The String.prototype.replaceAll was defined as followed:

function (str1, str2, ignore) {
  return this.replace(new RegExp(str1.replace(repAll, "\\#{setup}"), (ignore ? "gi" : "g")), (typeof(str2) == "string") ? str2.replace(/\$/g, "$$") : str2);
}
Community
  • 1
  • 1
Saram
  • 1,500
  • 1
  • 18
  • 35
  • 1
    Could you link to the jsperf bug? – Ja͢ck Jul 03 '13 at 15:50
  • @Jack when you start any buggy test (where replaceAll is defined in setup() procedure) go to console and display body of `String.prototype.replaceAll`. I'll do some comment to answer in a minute. – Saram Jul 03 '13 at 16:20
2

As far as performance goes, I find that the below function is as good as it gets:

String.prototype.htmlEscape = function() {
    var amp_re = /&/g, sq_re = /'/g, quot_re = /"/g, lt_re = /</g, gt_re = />/g;

    return function() {
        return this
          .replace(amp_re, '&amp;')
          .replace(sq_re, '&#39;')
          .replace(quot_re, '&quot;')
          .replace(lt_re, '&lt;')
          .replace(gt_re, '&gt;');
    }
}();

It initializes the regular expressions and returns a closure that actually performs the replacement.

Performance test

I should note that the test results are strange; when .replaceAll() is defined inside Benchmark.prototype.setup it runs twice as fast compared to when it's defined globally (i.e. inside a <script> tag). I'm still not sure why that is, but it definitely must be related to how jsperf itself works.

Using RegExp.compile()

I wanted to avoid using a deprecated function, mostly because this kind of performance should be done automatically by modern browsers. Here's a version with compiled expressions:

String.prototype.htmlEscape2 = function() {
    var amp_re = /&/g, sq_re = /'/g, quot_re = /"/g, lt_re = /</g, gt_re = />/g;

    if (RegExp.prototype.compile) {
        amp_re.compile();
        sq_re.compile();
        quot_re.compile();
        lt_re.compile();
        gt_re.compile();
    }

    return function() {
        return this
          .replace(amp_re, '&amp;')
          .replace(sq_re, '&#39;')
          .replace(quot_re, '&quot;')
          .replace(lt_re, '&lt;')
          .replace(gt_re, '&gt;');
    }
}

Doing so blows everything else out of the water!

Performance test

The reason why .compile() gives such a performance boost is because when you compile a global expression, e.g. /a/g it gets converted to /(?:)/ (on Chrome), which renders it useless.

If compilation can't be done, a browser should throw an error instead of silently destroying it.

Ja͢ck
  • 170,779
  • 38
  • 263
  • 309
  • Look at @Joachim Isaksson comment above. I think he found the trick. – Saram Jul 03 '13 at 10:00
  • @Saram I didn't test with the deprecated `.compile()`, but I doubt that would be a useful statistic. Something in the test benchmark is influencing the results. – Ja͢ck Jul 03 '13 at 10:01
  • @Saram Also, according to [MDN](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp), literal regular expressions *are* compiled. – Ja͢ck Jul 03 '13 at 10:14
  • @Saram I've updated the code with a compiled version; I still feel browsers should optimize for this stuff. – Ja͢ck Jul 03 '13 at 12:56
  • I fount that `compile()` destroys regexp, so it can not be used. Looks like a bug. – Saram Jul 03 '13 at 14:06
-1

Actually there are faster ways to do this.

If you could do an inline split and join, you will get a better performance.

//example below
var test = "This is a test string";
var test2 = test.split("a").join("A");

Try this and run the performance test.

blganesh101
  • 3,647
  • 1
  • 24
  • 44
  • Using the replaceAll function : **0.054** seconds/ Using the replaceAll2 function : **0.106** seconds/ Inline Split/Join : **0.111** seconds/ Inline RegExp Object : **0.182** seconds Inline RegExp Object without string: **0.134** seconds – Saram Jul 03 '13 at 10:48