3

I wanted to get a list of unique elements from a list with duplicate elements and the order of elements occurring in the list should be maintained.

To achieve this, I could have written an algorithm like:

private ArrayList<T> getUnique(ArrayList<T> list)
{
    // maintain a hashmap of numbers and a uniqueList to be returned(ArrayList<T>)
    // Add element in result list and the hashmap if the element isn't already present in the hashmap, else just add in the hashmap

    HashMap<T, Boolean> map = new HashMap<>();
    ArrayList<T> uniqueList = new ArrayList<>();

    for (T t: list)
    {
        if (map.get(t) == null)
        {
            // t wasn't present so, adding them in map as well as in the list
            map.put(t, true);
            uniqueList.add(t);
        }
    }
    return uniqueList;
}

This algorithm will take O(n) time with O(n) extra space (for HashMap).

Or simply, I could have used the below syntax:

Set<T> set = new LinkedHashSet<>(list);

The above syntax in Java is used for getting a set of unique elements from list with occurrence order of elements same as that of list. Then convert this set in a list. (ArrayList<T> uniqueList = new ArrayList<>(set);)

I am assuming the time complexity here is also O(n). I wanted to know what algorithm Java uses for it.

I see the class is named LinkedHashSet, so I thought that they might be using some LinkedList concepts for achieving this, So I looked into the source code, and found these stuff:

  1. In LinkedHashSet.java, the constructor looks like:

143: public LinkedHashSet(Collection<? extends T> c) 144: { 145: super(c); 146: } here is the source.

  1. So, I looked at the parent class constructor i.e. HashSet, I found:

public HashSet(Collection<? extends E> c) { map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16)); addAll(c); }

  1. Next, I searched for addAll method, I found it in AbstractCollection class(which is the grandparent of HashSet class), the definition of the function is:

public boolean addAll(Collection<? extends E> c) { boolean modified = false; for (E e : c) if (add(e)) modified = true; return modified; }

This is calling add which is like:

public boolean add(E e) { throw new UnsupportedOperationException(); } here.

I couldn't understand this. What algorithm do they use for this task?

Amit Upadhyay
  • 7,179
  • 4
  • 43
  • 57
  • Where is your source/version for LinkedHashSet.java? I see different content with the line number given in https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/share/classes/java/util/LinkedHashSet.java. – samabcde Sep 22 '18 at 14:13
  • @samabcde [this](http://developer.classpath.org/doc/java/util/LinkedHashSet-source.html) – Amit Upadhyay Sep 22 '18 at 14:15
  • @samabcde - The line numbers don't matter. This aspect of the code is the same for multiple versions. (And besides, Java 7 is **way out of date**.) – Stephen C Sep 22 '18 at 14:15
  • `add` is a polymorphic method (all methods are in Java). Take a look here https://stackoverflow.com/questions/4605669/what-is-polymorphic-method-in-java – algrid Sep 22 '18 at 14:43

3 Answers3

3

To answer your confusion, the add method is overridden in HashSet as follows:

public boolean add(E e) {
    return map.put(e, PRESENT)==null;
}

Note that LinkedHashSet extends HashSet extends AbstractSet extends AbstractCollection.


In summary, the algorithm used is:

    for (E e : c)
        add(e);

which is O(N) for a LinkedHashSet since the average complexity of add for LinkedHashSet is O(1).

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • yes, this add method is defined in `HashSet` class, but constructor in the HashSet class is calling the `addAll` method which isn't present in HashSet, I found it in `AbstractCollection` class which is grandparent of HashSet class. So, how can a grandparent's method call child class method? – Amit Upadhyay Sep 22 '18 at 14:23
  • 1
    inheritance. the child class inherits the grandparent's methods – Jason Sep 22 '18 at 14:26
  • 1
    `AddAll` is in `AbstractCollection`. That's the method that the `LinkedHashSet` constructor is calling. So the short answer is (as Jason says) "inheritance". – Stephen C Sep 22 '18 at 14:26
  • @Jason Yes you are right, but I asked `how can a grandparent's method call child class method?` Inheritance will give this capability to the child class. Also, I see the signature of `AbstractCollection` as: `public abstract class AbstractCollection implements Collection` it doesn't inherit. – Amit Upadhyay Sep 22 '18 at 14:31
  • 2
    @AmitUpadhyay - Inheritance gives the "capability" to call (visible) methods declared in **all** superclasses. The parent, the grand parent ... all the way to `java.lang.Object`. ( Provided that the method hasn't be overridden in an intermediate class. ) You probably been to review the basics of inheritance in Java, because this is pretty fundamental. – Stephen C Sep 22 '18 at 14:35
  • 1
    @AmitUpadhyay It is late binding and polymorphism – Maxim Sep 22 '18 at 14:37
  • 1
    @StephenC , thanks, I know what you have clarified above, but I missed out the comment written there `

    Note that this implementation will throw an * UnsupportedOperationException unless add is * overridden (assuming the specified collection is non-empty).` I now got it. Also, what I was asking is how can child method have scope in parent class?.[this](https://ideone.com/oOs1Ht). thanks!

    – Amit Upadhyay Sep 22 '18 at 14:57
3

For someone looking for the whole story

Base on the source code of LinkedHashSet, HashSet, LinkedHashMap. When constructing a LinkedHashSet which extends HashSet with other collection(LinkedHashSet.java line 143),

public LinkedHashSet(Collection<? extends T> c)  
{  
  super(c);  
}

Which will call (HashSet.java line 136):

public HashSet(Collection<? extends T> c)
{
  this(Math.max(2 * c.size(), HashMap.DEFAULT_CAPACITY));
  addAll(c);
}

and then call (HashSet.java line 122):

public HashSet(int initialCapacity, float loadFactor)
{
  map = init(initialCapacity, loadFactor);
}

Since the init method is overrided in LinkedHashSet

HashMap<T, String> init(int capacity, float load)
{
 return new LinkedHashMap<T, String>(capacity, load);
}

The backing map is a LinkedHashMap.

According to java doc of LinkedHashMap

This class provides all of the optional Map operations, and permits null elements. Like HashMap, it provides constant-time performance for the basic operations (add, contains and remove), assuming the hash function disperses elements properly among the buckets. Performance is likely to be just slightly below that of HashMap, due to the added expense of maintaining the linked list, with one exception: Iteration over the collection-views of a LinkedHashMap requires time proportional to the size of the map, regardless of its capacity. Iteration over a HashMap is likely to be more expensive, requiring time proportional to its capacity.

And the add method of HashSet is

public boolean add(E e) {
   return map.put(e, PRESENT)==null;
}

Hence the average time complexity is O(n) for the construction. For the algorithm, I think you can read the code of LinkedHashMap for details. Further reading How is the internal implementation of LinkedHashMap different from HashMap implementation?, HashSet vs LinkedHashSet

samabcde
  • 6,988
  • 2
  • 25
  • 41
  • 1
    Yes: "requires editing" means that anybody can fix the question by making it answerable. Here, the OP, the person asking the question needs to add information. Then you should rather look for some sort of close reason. Beyond that: I appreciate the quick and friendly comeback! – GhostCat Nov 06 '18 at 11:23
2

this is LinkedHashSet constructor :

public LinkedHashSet(Collection<? extends E> c) {
        super(Math.max(2*c.size(), 11), .75f, true);
        addAll(c);
    }

this is addAll function from java.util.AbstractCollection:

public boolean addAll(Collection<? extends E> c) {
        boolean modified = false;
        for (E e : c)
            if (add(e))
                modified = true;
        return modified;
    }

this is add function from java.util.HashSet:

public boolean add(E e) {
        return map.put(e, PRESENT)==null;
    }

easy-peasy if you use Intellij to find the source of the function.

kingGarfield
  • 304
  • 1
  • 12