6

I know following things about linkedHashSet

  • it maintains insertion order
  • uses LinkedList to preserve order
  • my question is how does hashing come into picture ??

I understand If hashing is used then the concept of bucketing comes in

However, from checking the code in the JDK it seems that LinkedHashSet implementation contains only constructor and no implementation, so I guess all the logic happens in HashSet?

  • so hashSet uses LinkedList by default ?

Let me put my question this way ... if objective is to write a collection that

  1. maintains unique values
  2. preserves insertion order using a linked list THEN ... it can easily be done without Hashing ... may be we can call this collection LinkedSet

saw a similar question what's the difference between HashSet and LinkedHashSet but not very helpful

Let me know if i need to explain my question more

Community
  • 1
  • 1
Lav
  • 1,283
  • 5
  • 21
  • 48
  • @JanDvorak: Why do you say that? The `public` `HashSet` constructors all initialize the `HashSet` to be backed by a non-linked `HashMap`, and then `LinkedHashSet` just calls a special, package-private constructor that uses a `LinkedHashMap` instead. – Louis Wasserman Feb 01 '13 at 04:19

5 Answers5

1

False. The implementation of LinkedHashSet is really all in LinkedHashMap. (And the implementation of HashSet is really all in HashMap. Le gasp!)

HashSet has no linked list at all.

It's entirely possible to write a LinkedSet collection backed by a linked list, that keeps elements unique -- it's just that its performance will be pretty crappy.

Louis Wasserman
  • 191,574
  • 25
  • 345
  • 413
  • I observe something different implementation-wise: http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/LinkedHashSet.java#LinkedHashSet.%3Cinit%3E%28%29 – John Dvorak Feb 01 '13 at 04:14
  • http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/HashSet.java#HashSet.%3Cinit%3E%28int%2Cfloat%2Cboolean%29 – John Dvorak Feb 01 '13 at 04:16
  • @JanDvorak: Trace that back to the [`HashSet` constructor that actually gets called.](http://www.docjar.com/html/api/java/util/HashSet.java.html#151) `HashSet` is just implemented as a wrapper around a `Map`, and `LinkedHashSet` just makes sure that that map is a `LinkedHashMap`. – Louis Wasserman Feb 01 '13 at 04:16
  • So the implementation of `LinkedHashSet` is actually in the `HashSet` class, and the `LinkedHashSet` class itself is just a thin wrapper. – John Dvorak Feb 01 '13 at 04:18
  • 1
    The implementaions are really in LinkedHashMap and HashMap, HashSet itself is just wrapping HashMap.keySet() mostly. – Affe Feb 01 '13 at 04:19
  • 1
    @JanDvorak: `LinkedHashSet` is a thin wrapper, yes, but more to the point, `HashSet` is _also_ a thin wrapper around a `Map`. – Louis Wasserman Feb 01 '13 at 04:21
1

It's an 'interesting' implementation. The constructors for LinkedHashSet defer to package-private constructors in HashSet which setup the data structure (a LinkedHashMap) for maintaining iteration order.

HashSet(int initialCapacity, float loadFactor, boolean dummy) {
    map = new LinkedHashMap<E,Object>(initialCapacity, loadFactor);
}

The API designers could simply have exposed this constructor as public, with appropriate documentation, but I guess they wanted the code to be more 'self-documenting'.

Perception
  • 79,279
  • 19
  • 185
  • 195
  • Why would you expose this as public? These are implementation details which probably shouldn't be exposed. – Louis Wasserman Feb 01 '13 at 04:21
  • @LouisWasserman why not make its linking capabilities a part of the contract? – John Dvorak Feb 01 '13 at 04:22
  • 1
    @JanDvorak: The linking capabilities are a part of `LinkedHashSet`'s contract, not `HashSet`'s. Why mix up the two together? Sure, it's convenient for the _implementation_ to mix them together, but that's not anything the user needs to know about. Additionally, it's perfectly possible that that implementation might need to change in the future, which couldn't happen if the JDK had locked itself to supporting that constructor publicly. – Louis Wasserman Feb 01 '13 at 04:23
  • Why not make it public? Of course, the `boolean dummy` would need to be changed to `boolean sorted` (or similar), and logic changed slightly to call the existing `HashSet(int, float)` if `sorted=false`. – Perception Feb 01 '13 at 04:24
  • @Perception because then _every_ implementation (Sun, OpenJDK, MsJava...) would need to provide _that_ constructor just because _one_ implementation _can_ do so. – John Dvorak Feb 01 '13 at 04:25
  • @Perception: I'm arguing that regardless of how much documentation you added, it's an implementation detail that shouldn't be exposed. If users want sortedness, they already have `LinkedHashSet`. – Louis Wasserman Feb 01 '13 at 04:25
  • @LouisWasserman - I'm not saying your arguments are invalid, I'm sure the Sun developers had the same ones. I would counter argue that this constructor is totally misplaced, since it is used *only* by the LinkedHashSet implementation. It should have been placed in the subclass that uses it. – Perception Feb 01 '13 at 04:28
  • I mean...yes, I would have had a package-private `HashSet(Map)` constructor instead of what they used here, but there had to be _some_ constructor in `HashSet` for `LinkedHashSet` to hook into. There wouldn't be any way to push all of the details down. – Louis Wasserman Feb 01 '13 at 04:32
1

If you look closely, you will see it is actually using some protected constructors on the HashSet that are there just for it, not regular ones. e.g.,

HashSet(int initialCapacity, float loadFactor, boolean dummy) {
    map = new LinkedHashMap<E,Object>(initialCapacity, loadFactor);
}

So the keySet being used to back the LinkedHashSet is in fact coming from the implementation of LinkedHashMap, not a regular HashMap like a regular HashSet. It doesn't actually use java.util.LinkedList. It just maintains pointers that form a list within the implementation of the bucket contents (Map.Entry<K,V>)

316    private static class Entry<K,V> extends HashMap.Entry<K,V> {
317        // These fields comprise the doubly linked list used for iteration.
318        Entry<K,V> before, after;
319
320        Entry(int hash, K key, V value, HashMap.Entry<K,V> next) {
321            super(hash, key, value, next);
322        }

Hashing comes into the picture because it's an easy way to create a collection that enforces uniqueness and offers constant-time performance for most operations. Sure we could just use a linked list and add uniqueness checking, but the time for several operations would become O(N) cause you'd have to iterate the whole list to check for duplicates.

Affe
  • 47,174
  • 11
  • 83
  • 83
1

Code Sample

Set<Registeration> registerationSet = new LinkedHashSet<>();
registerationSet.add(new Registeration());

Explanation of Line2.

  1. computes hashCode for Registeration object
  2. search for hashCode in registerationSet to locate the bucket
  3. check for equal object in shortlisted bucket
    • 3.1. if equal found, replace it, with new objects reference
    • 3.2. if not found, append/add Registeration object's reference in bucket

Parallel to it,

A List maintains entry order/queue of all elements inserted

  1. Always, add new reference to the end
  2. In case of replacement(3.1. in above), remove previous occurrence.
0

For a Specific answer to your question

  • how does hashing come into picture? (in a LinkedHashSet)

What the Java Docs say...

  • Like HashSet, it provides constant-time performance for the basic operations (add, contains and remove), assuming the hash function disperses elements properly among the buckets.
  • This linked list defines the iteration ordering, which is the order in which elements were inserted into the set (insertion-order).

The buckets accessed by a hashcode is used to speed up random access, and the LinkedList implementation is for returning an iterator which spits out elements in insertion order.

Hope i have answered your question?

austindz
  • 71
  • 1
  • 6