0

I need a script to search efficiently all the duplicates in a one-dimensional array. I tried a naive method :

for(var i=0, ii<arr.length-1; i<ii; i++)
    for(var j=i+1, jj<arr.length; j<jj; j++)
        if(arr[i] == arr[j])
            // remove the duplicate

Very simple but it takes a too long time if the array contains a large set of values. The tables that I use often contain hundreds of thousands of values, so that the number of iterations required for this operation is HUGE !

If someone has an idea !?

guy777
  • 222
  • 1
  • 14

2 Answers2

2

Use a LinkedHashSet or OrderedHashSet implementation, it does not allow duplicates and provides expected O(1) on insertion, lookup, and deletion. Since your OP says you want to remove the duplicates, there is no faster way to do this than O(n). In an array of 1,000,000 items max time was 16ms

  • Create a LinkedHashSet hs
  • foreach object obj in arr -- hs.add(obj);

Complexity is expected O(n) with a good hash function.

Niklas B.
  • 92,950
  • 18
  • 194
  • 224
TheJackal
  • 435
  • 4
  • 19
  • Strictly speaking, hash set does not guarantee `O(n)` complexity. – AlexD Feb 14 '14 at 12:51
  • I never said that it does, I said that removing duplicates from the dataset is worst case O(n). HashSet guarantees O(1) for operations – TheJackal Feb 14 '14 at 13:10
  • I meant that using of hash set does not guarantee overall `O(n)` complexity. No, hash set does not _guarantee_ O(1) due to possible hash collisions. – AlexD Feb 14 '14 at 13:14
  • Ok, i try up this solution. A verification before add a value into array seems a good solution for me ! – guy777 Feb 14 '14 at 13:17
  • My mistake, LinkedHashSet or OrderHashSet implementation avoids collisions. Could you explain why it does not guarantee O(n)? If the number of objects in the array are known, capacity can be set. – TheJackal Feb 14 '14 at 13:20
  • See http://en.wikipedia.org/wiki/Hash_table (the summary table in the top right corner: `O(1)` average, `O(n)` worst). – AlexD Feb 14 '14 at 14:38
  • @AlexD a good implementation can achieve O (1) with high probability, which is very very good. Every decent implementation guarantees at least O (1) expected runtime. So the answer here can be implemented in O (n) w.h.p and in practice it will run very fast. No point arguing about that, just edit in the word "expected" and be done with it – Niklas B. Feb 14 '14 at 17:10
0

This code could be the most efficient way you can do it ..!! Which is nothing but the direct implementation of set .

function eliminateDuplicates(arr) {
  var i,
      len=arr.length,
      out=[],
      obj={};

  for (i=0;i<len;i++) {
    obj[arr[i]]=0;
  }
  for (i in obj) {
    out.push(i);
  }
  return out;
}
MissingNumber
  • 1,142
  • 15
  • 29