23

I was clustering around 40000 points using kmean algorithm. In the first version of the program I wrote the euclidean distance function like this

var euclideanDistance = function( p1, p2 ) { // p1.length === p2.length == 3
    var sum = 0;
    for( var i in p1 ){
        sum += Math.pow( p1[i] - p2[i], 2 );
    }
    return Math.sqrt( sum );
};

The overall program was quite slow taking on average 7sec to execute. After some profiling I rewrote the above function like this

var euclideanDistance = function( p1, p2 ) { // p1.length === p2.length == 3
    var sum = 0;
    for( var i = 0; i < p1.length; i++ ) {
        sum += Math.pow( p1[i] - p2[i], 2 );
    }
    return Math.sqrt( sum );
};

Now the programs on average take around 400ms. That's a huge time difference just because of the way I wrote the for loop. I normally don't use for..in loop for arrays but for some reason I used it while writing this function.

Can someone explain why there is this huge difference in performance between these 2 styles?

Josnidhin
  • 12,469
  • 9
  • 42
  • 61
  • 2
    Please note that using `for..in` loops on arrays [can behave different from a regular for loop](http://stackoverflow.com/questions/500504/javascript-for-in-with-arrays). – David Pärsson Nov 30 '12 at 13:11
  • 1
    `for..in` enumerates object keys, for loop increases integer and checks a simple condition .. isn't obvious? – Esailija Nov 30 '12 at 13:12
  • p1 and p2 look like dense arrays of values. I would suspect the interpreter to optimize exactly this case. Also try pulling the querying of p1.length (you know it doesn't change, but the interpreter cannot assume that) out of the loop - should improve even more. – Stefan Nov 30 '12 at 13:14
  • Duplicate of http://stackoverflow.com/questions/242841/javascript-for-in-vs-for – Cerbrus Nov 30 '12 at 13:16
  • This answer here is explanatory: http://stackoverflow.com/questions/5263847/javascript-loops-for-in-vs-for – Cynical Nov 30 '12 at 13:17
  • @Cynical—the linked answer is about a NodeList, not an array, so not really appropriate. Some aspects are the same, but host objects are very different to native objects. – RobG Nov 30 '12 at 13:23
  • @RobG you are right, but I thought the theoretical part regarding arrays and objects was quite clear for the purposes of OP. – Cynical Nov 30 '12 at 13:36
  • Slight improvement http://jsfiddle.net/Up28V/6/ – EricG Nov 30 '12 at 13:55
  • Note that for large data sets, you could benefit from executing p1.length as a variable and using it in the for loop, instead of evaluating p1's length on every loop. – theunexpected1 Oct 12 '14 at 08:42
  • Change `for (var i=0; i – myfunkyside Dec 12 '16 at 06:52
  • And change `i++` to `++i` for optimal performance in every language (I know there's a big debate on whether that actually makes a difference, but `++i` will definitely not be slower, but could be faster, so.. better safe then sorry I say:). – myfunkyside Dec 12 '16 at 06:54

4 Answers4

29

Look at what's happening differently in each iteration:

for( var i = 0; i < p1.length; i++ ) 
  1. Check if i < p1.length
  2. Increment i by one

Very simple and fast.

Now look at what's happening in each iteration for this:

for( var i in p1 )

Repeat

  1. Let P be the name of the next property of obj whose [[Enumerable]] attribute is true. If there is no such property, return (normal, V, empty).

It has to find next property in the object that is enumerable. With your array you know that this can be achieved by a simple integer increment, where as the algorithm to find next enumerable is most likely not that simple because it has to work on arbitrary object and its prototype chain keys.

Esailija
  • 138,174
  • 23
  • 272
  • 326
  • @DeletedComment, yeah only the first line in the specification is really relevant. What I am trying to say that running the complex algorithm of finding the next enumerable key in an object and its protoype chain is far less performant than an integer addition and a boolean check. – Esailija Nov 30 '12 at 13:51
  • 2
    this is vague, not sure what's going on here – Alexander Mills May 05 '18 at 20:43
9

As a side note, if you cache the length of p1:

var plen = p1.length;
for( var i = 0; i < plen; i++ )

you will get a slight speed increase.

...And if you memoize the function, it will cache results, so if the user tries the same numbers you will see a massive speed increase.

var eDistance = memoize(euclideanDistance);  

function memoize( fn ) {  
    return function () {  
        var args = Array.prototype.slice.call(arguments),  
            hash = "",  
            i = args.length;  
        currentArg = null;  
        while (i--) {  
            currentArg = args[i];  
            hash += (currentArg === Object(currentArg)) ?  
            JSON.stringify(currentArg) : currentArg;  
            fn.memoize || (fn.memoize = {});  
        }  
        return (hash in fn.memoize) ? fn.memoize[hash] :  
        fn.memoize[hash] = fn.apply(this, args);  
    };  
}

eDistance([1,2,3],[1,2,3]);
eDistance([1,2,3],[1,2,3]); //Returns cached value

credit: http://addyosmani.com/blog/faster-javascript-memoization/

gkiely
  • 2,987
  • 1
  • 23
  • 37
6

First You should be aware of this in the case of for/in and arrays. No big deal if You know what You are doing.

I run some very simple tests to show the difference in performance between different loops: http://jsben.ch/#/BQhED

That is why prefer to use classic for loop for arrays.

Community
  • 1
  • 1
op1ekun
  • 1,918
  • 19
  • 26
  • In both FF and Chrome for loop works well without caching. However Opera is different story. Version without caching is 2 times slower in that browser... – op1ekun Nov 30 '12 at 15:01
0

The For/In loop, simply loops through all properties in an object. Since you are not specifying the number of iterations the loop needs to take, it simply 'guesses' at it, and continues on until there are no more objects.

With the second loop, you are specifying all possible variable... a)a starting point, b) the number of iterations the loop should take before stopping, c) increasing the count of the starting point.

You can think of it this way... For/In = guesses the number of iterations, For(a,b,c) you are specifying

Kevin
  • 2,684
  • 6
  • 35
  • 64
  • so, the for/in loop already "knows" the number of iterations it needs to make? (sorry, I don't speak technical... too much time spent talking to clients and dumming it down for them) – Kevin Nov 30 '12 at 14:17
  • 1
    yep, it does. Internally there is a property list storing all the properties of a object, and there used to have `__count__` property for the number of enumerable properties in Mozilla's implementation. :) – xiaoyi Nov 30 '12 at 14:24