3

I'm studying for an interview and have been working through some practice questions. The question is:

Find the most repeated integer in an array.

Here is the function I created and the one they created. They are appropriately named.

var arr = [3, 6, 6, 1, 5, 8, 9, 6, 6]

function mine(arr) {
  arr.sort()
  var count = 0;
  var integer = 0;
  var tempCount = 1;
  var tempInteger = 0;
  var prevInt = null
  for (var i = 0; i < arr.length; i++) {
    tempInteger = arr[i]
    if (i > 0) {
      prevInt = arr[i - 1]
    }
    if (prevInt == arr[i]) {
      tempCount += 1
      if (tempCount > count) {
        count = tempCount
        integer = tempInteger
      }
    } else {
      tempCount = 1
    }
  }
  console.log("most repeated is: " + integer)
}

function theirs(a) {
  var count = 1,
    tempCount;
  var popular = a[0];
  var temp = 0;
  for (var i = 0; i < (a.length - 1); i++) {
    temp = a[i];
    tempCount = 0;
    for (var j = 1; j < a.length; j++) {
      if (temp == a[j])
        tempCount++;
    }
    if (tempCount > count) {
      popular = temp;
      count = tempCount;
    }
  }
  console.log("most repeated is: " + popular)
}
console.time("mine")
mine(arr)
console.timeEnd("mine")
console.time("theirs")
theirs(arr)
console.timeEnd("theirs")

These are the results:

most repeated is: 6
mine: 16.929ms
most repeated is: 6
theirs: 0.760ms

What makes my function slower than their?

Anders
  • 8,307
  • 9
  • 56
  • 88
  • I would have guess mine was more efficient because I didn't have the extra loop. – user3347300 Nov 03 '15 at 18:04
  • 2
    you do `sort` in `mine` function, `their` used already sorted array – Grundy Nov 03 '15 at 18:06
  • When I try it, the difference isn't so great as your test. `mine: 0.340ms, theirs 0.191ms` – Barmar Nov 03 '15 at 18:11
  • This is strange. Your should be `O(n*log(n))` because of the sorting, and theirs should be `O(n^2)` because of the nested loops. So your looks faster to me. (You could do it in `O(n)` if you use some extra storage.) – Anders Nov 03 '15 at 18:12
  • 2
    i comment `console.log("most repeated is: " + integer)` and `console.log("most repeated is: " + popular)` lines and get `mine: 0.000ms, theirs 0.000ms` :-D – Grundy Nov 03 '15 at 18:15
  • @Anders I agree his should be faster even tho practically there are 2 iterations of O(n). Honestly I just think this is race condition. Can you try to run them separately? – sinanspd Nov 03 '15 at 18:15
  • 1
    i think for testing array should be a bit bigger – Grundy Nov 03 '15 at 18:17
  • 1
    @grundy yes, like about 1000 times bigger. – Pointy Nov 03 '15 at 20:14

3 Answers3

2

It looks like this isn't a fair test. When you run your function first, it sorts the array. This means their function ends up using already sorted data but doesn't suffer the time cost of performing the sort. I tried swapping the order in which the tests were run and got nearly identical timings:

console.time("theirs")
theirs(arr)
console.timeEnd("theirs")
console.time("mine")
mine(arr)
console.timeEnd("mine")

most repeated is: 6
theirs: 0.307ms
most repeated is: 6
mine: 0.366ms

Also, if you use two separate arrays you'll see that your function and theirs run in the same amount of time, approximately.

Lastly, see Anders' answer -- it demonstrates that larger data sets reveal your function's O(n*log(n)) + O(n) performance vs their function's O(n^2) performance.

neuronaut
  • 2,689
  • 18
  • 24
  • Maybe I am missing something here, but I fail to see how their function would benefit from the array being sorted. It loops through the full array both in the outer and the inner loop anyway. – Anders Nov 03 '15 at 18:24
  • @Anders I may be wrong about that being the reason between the performance difference when running the two functions in the opposite order, but the results speak for themselves. – neuronaut Nov 03 '15 at 18:30
  • @Anders, the answer says " time cost of performing the sort". So mine() is not just with one for loop, the sort() function it calls may have got more loops inside, which consumes more time than theirs(). – Naren Neelamegam Nov 03 '15 at 18:35
2

My test results

I get the following results when I test (JSFiddle) it for a random array with 50 000 elements:

mine:     28.18 ms
theirs: 5374.69 ms

In other words, your algorithm seems to be much faster. That is expected.

Why is your algorithm faster?

You sort the array first, and then loop through it once. Firefox uses merge sort and Chrome uses a variant of quick sort (according to this question). Both take O(n*log(n)) time on average. Then you loop through the array, taking O(n) time. In total you get O(n*log(n)) + O(n), that can be simplified to just O(n*log(n)).

Their solution, on the other hand, have a nested loop where both the outer and inner loops itterate over all the elements. That should take O(n^2). In other words, it is slower.

Why does your test results differ?

So why does your test results differ from mine? I see a number of possibilities:

  • You used a to small sample. If you just used the nine numbers in your code, that is definately the case. When you use short arrays in the test, overheads (like running the console.log as suggested by Gundy in comments) dominate the time it takes. This can make the result appear completely random.
  • neuronaut suggests that it is related to the fact that their code operates on the array that is already sorted by your code. While that is a bad way of testing, I fail to see how it would affect the result.
  • Browser differences of some kind.

A note on .sort()

A further note: You should not use .sort() for sorting numbers, since it sorts things alphabetically. Instead, use .sort(function(a, b){return a-b}). Read more here.

A further note on the further note: In this particular case, just using .sort() might actually be smarter. Since you do not care about the sorting, only the grouping, it doesnt matter that it sort the numbers wrong. It will still group elements with the same value together. If it is faster without the comparison function (i suspect it is), then it makes sense to sort without one.

An even faster algorithm

You solved the problem in O(n*log(n)), but you can do it in just O(n). The algorithm to do that is quite intuitive. Loop through the array, and keep track of how many times each number appears. Then pick the number that appears the most times.

Lets say there are m different numbers in the array. Looping through the array takes O(n) and finding the max takes O(m). That gives you O(n) + O(m) that simplifies to O(n) since m < n.

This is the code:

function anders(arr) {

    //Instead of an array we use an object and properties.
    //It works like a dictionary in other languages.
    var counts = new Object();

    //Count how many of each number there is.
    for(var i=0; i<arr.length; i++) {
        //Make sure the property is defined.
        if(typeof counts[arr[i]] === 'undefined')
            counts[arr[i]] = 0;
        //Increase the counter.
        counts[arr[i]]++;
    }

    var max;             //The number with the largest count.
    var max_count = -1;  //The largest count.

    //Iterate through all of the properties of the counts object
    //to find the number with the largerst count.
    for (var num in counts) {
        if (counts.hasOwnProperty(num)) {
            if(counts[num] > max_count) {
                max_count = counts[num];
                max = num;
            }
        }
    }

    //Return the result.
    return max;

}

Running this on a random array with 50 000 elements between 0 and 49 takes just 3.99 ms on my computer. In other words, it is the fastest. The backside is that you need O(m) memory to store how many time each number appears.

Community
  • 1
  • 1
Anders
  • 8,307
  • 9
  • 56
  • 88
  • 1
    So the reason he got a different result is that he tried too small a sample. – Barmar Nov 03 '15 at 18:23
  • "I fail to see how it would affect the result" - perhaps branch prediction? – ColBeseder Nov 03 '15 at 18:57
  • @ColBeseder Good point. After all, it is [the most upvoted answer on SO](http://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-an-unsorted-array). – Anders Nov 03 '15 at 19:20
0

Other answers here already do a great job of explaining why theirs is faster - and also how to optimize yours. Yours is actually better with large datasets (@Anders). I managed to optimize the theirs solution; maybe there's something useful here.


I can get consistently faster results by employing some basic JS micro-optimizations. These optimizations can also be applied to your original function, but I applied them to theirs.

  • Preincrementing is slightly faster than postincrementing, because the value does not need to be read into memory first
  • Reverse-while loops are massively faster (on my machine) than anything else I've tried, because JS is translated into opcodes, and guaranteeing >= 0 is very fast. For this test, my computer scored 514,271,438 ops/sec, while the next-fastest scored 198,959,074.
  • Cache the result of length - for larger arrays, this would make better more noticeably faster than theirs

Code:

function better(a) {
    var top = a[0],
        count = 0,
        i = len = a.length - 1;
    while (i--) {
        var j = len,
            temp = 0;
        while (j--) {
            if (a[j] == a[i]) ++temp;
        }
        if (temp > count) {
            count = temp;
            top = a[i];
        }
    }
    console.log("most repeated is " + top);
}

[fiddle]

It's very similar, if not the same, to theirs, but with the above micro-optimizations.

Here are the results for running each function 500 times. The array is pre-sorted before any function is run, and the sort is removed from mine().

mine: 44.076ms
theirs: 35.473ms
better: 32.016ms
Community
  • 1
  • 1
Scott
  • 5,338
  • 5
  • 45
  • 70
  • 1
    When I compare `theirs` and `better` with a larger random array (50 000 elements), `theirs` is faster than `better`. Since you tested the functions 500 times, I am a bit surprised. Perhaps it is the `console.log` that produces the strange results? – Anders Nov 03 '15 at 19:38
  • 1
    Weird! I guess you just have to pick whichever method works faster for your particular set of elements. – Scott Nov 03 '15 at 19:56