0

If I want to covert an array of objects into a set how would I do so in the most efficient way possible? A set contains no duplicate values, so I was thinking I could use a generic merge sort algorithm which would split the array into 2 sequences and then from that point use the comparator to sort the array and get rid of of any duplicate values should an element from sequence A be equal to sequence B. This would give me O(nlogn). Is this the right approach or is there a more accurate/ more efficient way of approaching this problem?

  • Yeah that's pretty much it. Either that or you just create a set and insert everything in it. Either way it's `O(nlogn)` – Shahbaz Jan 28 '14 at 16:39
  • You need to say what you want your set's interface to be. And in addition, what the time complexity should be for each method in the interface. Because it's possible to use an array as the implementation of a set, with `O(1)` insertion and `O(n)` containment check. – amnn Jan 28 '14 at 16:39
  • 1
    That's the right approach if you wear the shackles of the comparison black box. If you have hashing or consider the items as bit strings, there are alternatives. Can you do that? What do your objects represent? –  Jan 28 '14 at 16:40
  • @asQuirreL In general you're right, but in this case we want to remove duplicates. –  Jan 28 '14 at 16:40
  • Is this what you're looking for? http://stackoverflow.com/questions/3350641/array-remove-duplicate-elements – user3096052 Jan 28 '14 at 16:41

3 Answers3

1

You basically have two choices. Use a hashing based solution that can give you O(n) average case performance - at the cost of O(n) extra space, or use a sorting solution which is O(nlogn) and can be done with very little extra space.

Note that you cannot get better than it, because this will allow you to solve element distinctness problem1 better than the suggested approach - which is known as impossible.


(1)By simply creating the set and checking if its size is the size of the array - if and only if it is, every element is distinct

amit
  • 175,853
  • 27
  • 231
  • 333
  • I assume in the second paragraph by you refer to the Θ(n log n) bound third paragraph of the Wikipedia article. Then beware that this bound is in a particular, restrictive model of computation (algebraic decision tree). The RAM model of computation, which is closer to real machines, allows better solutions. –  Jan 28 '14 at 16:50
0

You can use a Map to store elements already put in the Set. This is O(n) algorithm, but keep in mind you will consume more memory. Here's an example:

Set<Integer> s = new HashSet<>();
Map<Integer, Boolean> m = new HashMap<>();

Integer[] array = {1, 1, 1, 2, 3, 6, 1, 2, 5, 7};

for(Integer i: array) {
    if(m.get(i) == null) {
        m.put(i, true);
        s.add(i);
    }
}

Will prints

[1, 2, 3, 5, 6, 7]
Happy
  • 1,815
  • 2
  • 18
  • 33
0

Surely O(n) is better than O(n log n) when n is non-trivial? In that case, why don't you just iterate over the list and not add duplicates if you see an element has already been added?

Yatharth Agarwal
  • 4,385
  • 2
  • 24
  • 53
  • For n insertions (in an empty array or otherwise) this takes O(n²) time. The O(n log n) bound is for n insertions too. Given a sorted array, one can do a single insertion in O(log n) time. –  Jan 28 '14 at 16:46