3

I wrote a small block of code yesterday that uses two for loops to compare objects in two arrays(the arrays are the same though).

var result = []
for (var i = 0; i < res.length; i++) {
    var tempObj = {}
    tempObj.baseName = res[i].name
    tempObj.cnt = res[i].cnt
    tempObj.matches = []
    for (var j = 0; j < compareArr.length; j++) {
        if(natural.LevenshteinDistance(res[i].name, compareArr[j].name) === options.distance) {
            tempObj.matches.push(compareArr[j])
        }
    }
    if (tempObj.matches.length > 0) {
        result.push(tempObj)
    }
}

However, I have been on a functional programming kick the last few months and decided rewrite the code block using a more functional approach and ended up with this:

var result = res.
    map(function(baseItem) {
        baseItem.matches = compareArr.
            reduce(function(acc, compItem) {
                if(natural.LevenshteinDistance(baseItem.name, compItem.name) === options.distance) {
                    acc.push(compItem)
                }
                return acc
            }, [])
            return baseItem
        }).
        filter(function(item) {
          return item.matches.length > 0
        })

My route felt like it was responding a bit slower however, the data being iterated over is the result of a database query that may contain 10s of thousands of items and I wanted to make sure I wasn't about to hurt the performance of the server for no reason. So, I plugged the functions into jsperf, and the results were saddening. The for loops run at about 2,600 ops/sec, while the second block runs at about 500 ops/sec. :(

The question is, is my second block poorly written, can it be improved and brought up to speed? If not, is this normal? I see more and more people pushing functional style javascript.

Am I hurting performance in the name of style? Should I enjoy learning functional languages and just leave it out of my javascript?

http://jhusain.github.io/learnrx/

https://github.com/timoxley/functional-javascript-workshop

https://medium.com/javascript-scene/the-two-pillars-of-javascript-ee6f3281e7f3

John Resig seems to be a fan -> http://ejohn.org/blog/partial-functions-in-javascript/

http://shop.oreilly.com/product/0636920028857.do

I realize this post went from very specific to very general very quickly, I'll edit the scope and make a new post if suggested.

EDIT: Added tests for lodash and underscore to the group. Lodash comes in second at around 870 ops/sec and underscore at only 475 ops/sec. Tests here.

I found a benchmark of fast.js vs a for loop and a js native function here and it is similarly blown away by a simple for loop.

Jake Sellers
  • 2,350
  • 2
  • 21
  • 40
  • 6
    The builtin functions do too much so they are slower. It is not a big deal usually in practice, but you can still keep the abstraction, but substitute them for something like [fast.js](https://github.com/codemix/fast.js/tree/master). – elclanrs Nov 07 '14 at 23:15
  • elclanrs: ah, I should add tests for FastJS and LoDash, I totally forgot about those options. :) – Jake Sellers Nov 07 '14 at 23:17
  • hmm. I added cach'ing for the length so there isn't a hit for every reiteration, and that is actually slower than NOT using cache, so to me -- those results are unreliable. How can not caching be faster than caching? btw - for me, "firefox" - native browser func was much faster than libraries. – james emanon Nov 09 '14 at 03:35
  • @jamesemanon: link to your caching tests? I just ran my own tests in firefox and although the for loops are still the fastest, the native functions were a close second, with the libs failing miserably, lodash at only 119 ops/sec. Of course, these results are interesting but also completely meaningless as node runs on the v8 engine. – Jake Sellers Nov 09 '14 at 04:58
  • Same as you, except the "cached" version was behind the "for" but ahead of the others. see: http://jsperf.com/forvsfunc/5 , I just created a new one from your initial version. – james emanon Nov 09 '14 at 06:13
  • @jamesemanon: Ah, I misunderstood what you meant by caching, when saving the length of the arrays, its best and more common to in-line them like: `for(var i=0, rLength = res.length; i++)...` I modified your new test and its now the quickest, on chrome at least. http://jsperf.com/forvsfunc/6 – Jake Sellers Nov 09 '14 at 20:40

1 Answers1

0

Array methods are inherently slower than for loops since 1) they have to re-build the function scope on every iteration and 2) some of them (.map, .reduce) have to rebuild a copy of the array (so more memory, more GarbageCollection and generally more operations). So if speed is your concern, stay as low as possible.

Specifically to your algorithm though, there are a few things that you can do to improve the runtime. Your most expensive operation is the LevenshteinDistance so optimizing that would give you considerable speed improvements.

The easiest thing you can do is do a length check of the strings and do an early return: you know that if 2 strings' length differ by more than options.distance their Levenshtein distance will be at least more than that, so you can easily early return:

for (var j = 0; j < compareArr.length; j++) {
    // This check was added
    if (Math.abs(res[i].name.length - compareArr[j].name.length) > options.distance) {
        continue;
    }
    if(natural.LevenshteinDistance(res[i].name, compareArr[j].name) === options.distance) {
        tempObj.matches.push(compareArr[j])
    }
}

There are also several improvements to the method itself that are better explained in another stackoverflow post: https://stackoverflow.com/a/3183199/574576

Stefan
  • 3,962
  • 4
  • 34
  • 39