3

One problem:

I want to process a string (str) so that any parenthesised digits (matched by rgx) are replaced by values taken from the appropriate place in an array (sub):

var rgx = /\((\d+)\)/,
    str = "this (0) a (1) sentence",
    sub = [
            "is",
            "test"
        ],
    result;

The result, given the variables declared above, should be 'this is a test sentence'.

Two solutions:

This works:

var mch,
    parsed = '',
    remainder = str;
while (mch = rgx.exec(remainder)) { // Not JSLint approved.
    parsed += remainder.substring(0, mch.index) + sub[mch[1]];
    remainder = remainder.substring(mch.index + mch[0].length);
}
result = (parsed) ? parsed + remainder : str;

But I thought the following code would be faster. It has fewer variables, is much more concise, and uses an anonymous function expression (or lambda):

result = str.replace(rgx, function() {
    return sub[arguments[1]];
});

This works too, but I was wrong about the speed; in Chrome it's surprisingly (~50%, last time I checked) slower!

...

Three questions:

  1. Why does this process appear to be slower in Chrome and (for example) faster in Firefox?
  2. Is there a chance that the replace() method will be faster compared to the while() loop given a bigger string or array? If not, what are its benefits outside Code Golf?
  3. Is there a way optimise this process, making it both more efficient and as fuss-free as the functional second approach?

I'd welcome any insights into what's going on behind these processes.

...

[Fo(u)r the record: I'm happy to be called out on my uses of the words 'lambda' and/or 'functional'. I'm still learning about the concepts, so don't assume I know exactly what I'm talking about and feel free to correct me if I'm misapplying the terms here.]

guypursey
  • 3,114
  • 3
  • 24
  • 42
  • In JS it's not called "*lamba*", but "*(anonymous) function expression*" though it's the same concept. "*functional*" is the correct term. – Bergi Mar 17 '13 at 20:10
  • JSYK, in IE10 the anonymous function method has about twice as many ops/s – Niet the Dark Absol Mar 17 '13 at 21:12
  • @Bergi Thanks for the clarification and reassurance. I was aware of the term "anonymous function expression" (and its more common usage by users of JavaScript). But I thought "lambda" was *synonymous enough* [?], might appeal to those with more general knowledge of functional programming, and therefore attract answers from those who might be able to say more about optimisation w/r/t different programming paradigms. I've updated the question now to include the term "anonymous function expression". Hope that makes more sense. – guypursey Mar 19 '13 at 08:16
  • @Kolink Thanks for pointing this out. Also interesting to see that IE 10.0 performs better than Firefox 19.0 overall... – guypursey Mar 19 '13 at 08:20
  • @guypursey: Yes, "lambda" is fine and unambiguous as well, it just targets more on the people who know functional programming :-) – Bergi Mar 19 '13 at 12:02

2 Answers2

3

Why does this process appear to be slower in Chrome and (for example) faster in Firefox?

Because it has to call a (non-native) function, which is costly. Firefox's engine might be able to optimize that away by recognizing and inlining the lookup.

Is there a chance that the replace() method will be faster compared to the while() loop given a bigger string or array?

Yes, it has to do less string concatenation and assignments, and - as you said - less variables to initialize. Yet you can only test it to prove my assumptions (and also have a look at http://jsperf.com/match-and-substitute/4 for other snippets - you for example can see Opera optimizing the lambda-replace2 which does not use arguments).

If not, what are its benefits outside Code Golf?

I don't think code golf is the right term. Software quality is about readabilty and comprehensibility, in whose terms the conciseness and elegance (which is subjective though) of the functional code are the reasons to use this approach (actually I've never seen a replace with exec, substring and re-concatenation).

Is there a way optimise this process, making it both more efficient and as fuss-free as the functional second approach?

You don't need that remainder variable. The rgx has a lastIndex property which will automatically advance the match through str.

guypursey
  • 3,114
  • 3
  • 24
  • 42
Bergi
  • 630,263
  • 148
  • 957
  • 1,375
  • `actually I've never seen a replace with exec, substring and re-concatenation` Now [you see it](http://stackoverflow.com/questions/15445935/what-is-a-simple-solution-to-editable-by-end-user-text-replacement-placeholders/15446753#15446753). I can't find a way to do this with `replace` (where we don't have to discard the return value - you can always run your own code and discard the return value). – nhahtdh Mar 17 '13 at 20:37
  • 1
    @nhahtdh: OK, but that's just a workaround for lookbehind in JS. And you could use something like `.replace(/\[(\w+)\]/g, function(m, key, offset, str) { if (offset>0 && str.charAt(offset-1) == '[') return m.slice(1); return table[key]; })` – Bergi Mar 17 '13 at 20:44
  • @Bergi And Chrome is obviously much better at optimizing the hand-made code ;) – a better oliver Mar 17 '13 at 20:45
  • @Bergi: Can you rerun the test? I kinda overwrite the revision when adding a new test. – nhahtdh Mar 17 '13 at 21:46
  • Thanks, I've +1'd this but what do you mean by non-native function in this case? I thought `replace()` was native to JavaScript; I haven't pulled it in from a library. Also what do you mean by 'recognising and inlining the lookup'? I know you're only saying that's *possibly* what Firefox is doing but I'm not sure what it means. Sorry if this is a basic question; I'd describe myself as highly conversant in JavaScript but I don't know much about how the interpretation/execution happens in any browser, so may just need pointing to a good intro or primer on the topic. Thanks. – guypursey Mar 19 '13 at 19:57
  • PS. Thanks for the pointer about the `lastIndex` property -- I was aware of that but as I said to @nhahtdh in the other answer, for some reason I'd overlooked a global regex in the while-loop option. You can even see that in my test prep as I set up two RegExp literals, one for each snippet! * blushes * – guypursey Mar 19 '13 at 19:59
  • I meant that a call to `replace`, `exec` or `substring` is probably less expensive than the call to your lambda function, since the native functions will be optimized and not need to load the custom variable environment. With "inlining" I was referring to http://en.wikipedia.org/wiki/Inline_expansion, which may be one of the optimisations modern JS engines are capable to perform when their static code analysis says so. Don't know a good introduction to that topic unfortunately :-( – Bergi Mar 19 '13 at 20:16
  • @Bergi Thanks for the link. Now it's part of the answer, I've marked this correct. I still don't completely understand as surely functional programming means that a call site has already been replaced by the body of the callee (an anonymous function) so inlining has already occurred. But I suspect this optimisation is taking place at a lower level than I'm currently conversant with... More for me to read up on I guess! – guypursey Mar 21 '13 at 20:05
2

Your while loop with exec() is slightly slower than it should be, since you are doing extra work (substring) as you use exec() on a non-global regex. If you need to loop through all matches, you should use a while loop on a global regex (g flag enabled); this way, you avoid doing extra work trimming the processed part of the string.

var rgR = /\((\d+)\)/g;
var mch,
    result = '',
    lastAppend = 0;

while ((mch = rgR.exec(str)) !== null) {
    result += str.substring(lastAppend, mch.index) + sub[mch[1]];
    lastAppend = rgR.lastIndex;
}
result += str.substring(lastAppend);

This factor doesn't disturb the performance disparity between different browser, though.

It seems the performance difference comes from the implementation of the browser. Due to the unfamiliarity with the implementation, I cannot answer where the difference comes from.

In terms of power, exec() and replace() have the same power. This includes the cases where you don't use the returned value from replace(). Example 1. Example 2.

replace() method is more readable (the intention is clearer) than a while loop with exec() if you are using the value returned by the function (i.e. you are doing real replacement in the anonymous function). You also don't have to reconstruct the replaced string yourself. This is where replace is preferred over exec(). (I hope this answers the second part of question 2).

I would imagine exec() to be used for the purposes other than replacement (except for very special cases such as this). Replacement, if possible, should be done with replace().

Optimization is only necessary, if performance degrades badly on actual input. I don't have any optimization to show, since the 2 only possible options are already analyzed, with contradicting performance between 2 different browser. This may change in the future, but for now, you can choose the one that has better worst-performance-across-browser to work with.

Community
  • 1
  • 1
nhahtdh
  • 55,989
  • 15
  • 126
  • 162
  • Thanks for the tip about the global flagged regex in the while loop. For some reason I'd overlooked that possibility, so that's worth a +1 from me at least :-) I also take your point about choosing the solution with the better worst-performance across browsers; that's a good practical measure. – guypursey Mar 19 '13 at 19:52