18

For example in the code below:

public int commonTwo(String[] a, String[] b)
{
    Set common = new HashSet<String>(Arrays.asList(a));
    common.retainAll(new HashSet<String>(Arrays.asList(b)));
    return common.size();
} 
Anirudh
  • 2,286
  • 4
  • 38
  • 64

2 Answers2

16

Lets take a peruse at the code. The method retainAll is inherited from AbstractCollection and (at least in OpenJDK) looks like this:

public boolean retainAll(Collection<?> c) {
    boolean modified = false;
    Iterator<E> it = iterator();
    while (it.hasNext()) {
        if (!c.contains(it.next())) {
            it.remove();
            modified = true;
        }
    }
    return modified;
}

There is one big this to note here, we loop over this.iterator() and call c.contains. So the time complexity is n calls to c.contains where n = this.size() and at most n calls to it.remove().

This important thing is that the contains method is called on the other Collection and so the complexity is dependant upon the complexity of the other Collection contains.

So, whilst:

Set<String> common = new HashSet<>(Arrays.asList(a));
common.retainAll(new HashSet<>(Arrays.asList(b)));

Would be O(a.length), as HashSet.contains and HashSet.remove are both O(1) (amortized).

If you were to call

common.retainAll(Arrays.asList(b));

Then due to the O(n) contains on Arrays.ArrayList this would become O(a.length * b.length) - i.e. by spending O(n) copying the array to a HashSet you actually make the call to retainAll much faster.

As far as space complexity goes, no additional space (beyond the Iterator) is required by retainAll, but your invocation is actually quite expensive space-wise as you allocate two new HashSet implementations which are actually fully fledged HashMap.

Two further things can be noted:

  1. There is no reason to allocate a HashSet from the elements in a - a cheaper collection that also has O(1) remove from the middle such as an LinkedList can be used. (cheaper in memory and also build time - a hash table is not built)
  2. Your modifications are being lost as you create new collection instances and only return b.size().
Boris the Spider
  • 59,842
  • 6
  • 106
  • 166
  • Thanks for the explanation, regarding space complexity I think using HashSets would result in increase in hashing as the size of the Method Parameters increase. Don't you think with respect to the size of input the method's space complexity increases linearly or of O(n)? – Anirudh Jul 15 '14 at 11:21
  • @Anirudh as I said _but your invocation is actually quite expensive space-wise as you allocate two new `HashSet` implementations_. These are really quite memory intensive due to all the satellite data. The `retainAll` method itself would use `O(1)` memory I reckon. – Boris the Spider Jul 15 '14 at 11:34
3

The implementation can be found in the java.util.AbstractCollection class. The way it is implemented looks like this:

public boolean retainAll(Collection<?> c) {
        Objects.requireNonNull(c);
        boolean modified = false;
        Iterator<E> it = iterator();
        while (it.hasNext()) {
            if (!c.contains(it.next())) {
                it.remove();
                modified = true;
            }
        }
        return modified;
    }

So it will iterate everything in your common set and check if the collection that was passed as a parameter contains this element.

In your case both are HashSets, thus it will be O(n), as contains should be O(1) amortized and iterating over your common set is O(n).

One improvement you can make, is simply not copy a into a new HashSet, because it will be iterated anyway you can keep a list.

Thomas Jungblut
  • 20,854
  • 6
  • 68
  • 91
  • 2
    You last comment doesn't work because the array `a` would be modified by the calls to `it.remove()` if it weren't for the fact that `Arrays.ArrayList` doesn't support `remove()`. And `Arrays.ArrayList.size()` always returns the length of the wrapped array. – Boris the Spider Jul 15 '14 at 10:07