0

I have used simple HashSet to store numbers. I have simply added 0 to 99999 numbers to HashSet . But after 65535 , HashSet is not sorted but some different pattern is observed. Why Collection is not sorted although I am adding numbers which are already sorted.Why such different pattern is observed after 65535 ? does 65535 indicates something from this example ?

Code:

import java.util.*;
class TestClass {
    public static void main(String args[] ) throws Exception {

        HashSet<Integer> hsset=new HashSet<>();
        for(int i=0;i<100000;i++)hsset.add(i);
        for(int i:hsset){
            System.out.print(i+" ");
        }
    }
}

Difference in Output from 65535:

65507 65508 65509 65510 65511 65512 65513 65514 65515 65516 65517 65518 65519 65520 65521 65522 65523 65524 65525 65526 65527 65528 65529 65530 65531 65532 65533 65534 65535 65537 65536 65539 65538 65541 65540 65543 65542 65545 65544 65547 65546 65549 65548 65551

NIKUNJ KHOKHAR
  • 791
  • 6
  • 17
  • The point of both answers is: the fact you are seeing numbers come out of your HashSet in order is just a coincidence. – simonwo Jun 14 '17 at 07:26
  • @simonwo Not quite a coincidence (that would be miraculous), but an implementation detail that shouldn't be expected or relied upon. – shmosel Jun 14 '17 at 07:31

3 Answers3

7

HashSet has no guaranteed order for its elements, so what you're seeing is most likely an artifact of how hashing is done and how elements are stored according to their hashes.

If you want a sorted set, TreeSet might be more appropriate. If you just want an ordered collection, then take a look at ArrayList. (Or LinkedHashSet, as Eran notes, which maintains insertion order.)

Keep in mind that a set is mathematically just a number of elements (without duplicates) that are in it while everything else is outside of it. Ordering of the elements among each other doesn't matter at all and isn't even required. However, since some sort of order is sometimes useful for certain algorithms there are special implementations that add this property to the mathematical ideal of a set.

Joey
  • 344,408
  • 85
  • 689
  • 683
  • but I am already adding sorted numbers Collection . then why it unsorts it ? – NIKUNJ KHOKHAR Jun 14 '17 at 07:19
  • It doesn't unsort it. It never guaranteed any ordering to begin with. The order you saw is an implementation detail which obviously doesn't persist after a certain number of elements. Heck, you could be seeing different behaviour on different JVMs, different versions of the same JVM, or even different invocations of your program. Nothing is guaranteed there. – Joey Jun 14 '17 at 07:20
  • @Joey I have tried different inputs. It sorts upto 65535 . does this number signifies something ? – NIKUNJ KHOKHAR Jun 14 '17 at 07:22
  • 2
    @NIKUNJKHOKHAR: That's an implementation detail that you're *explicitly instructed not to depend on*. From the first paragraph of the docs: "It makes no guarantees as to the iteration order of the set" – Jon Skeet Jun 14 '17 at 07:26
  • @NIKUNJKHOKHAR: As noted, **it should absolutely not matter to you, as the `HashSet` class makes no such guarantees!**. Take a look at the source code if you're *really* curious, but the collection classes (being some of the most-used ones in the runtime library) are often not easy to read. In the end it comes down to `HashMap.add` since that's used inside `HashSet`. `HashMap` uses different strategies depending on configuration, how full the map is, and how many items there are. I'd guess you're seeing the result of such a switch from one storage strategy to another. – Joey Jun 14 '17 at 07:27
  • @NIKUNJKHOKHAR In case it has escaped your notice 65535 is (2^16)-1. That number jumps out a people who think in binary... Hashmap works on hashtable sizes that are powers of 2. As others point out this will be an artifact of how the class is implemented. It may be iterating through the hash-table which just happens to be filled in order up to that value. Maybe the hashtable is 65536 elements. This effect will vary from implementation to implementation of `HashMap`. – Persixty Jun 14 '17 at 07:33
4

As I explained in https://stackoverflow.com/a/2144822/139985, the apparent ordering that you are seeing is an artefact of 1) the implementation of Integer.hashCode() and 2) the particular way that you are populating the HashSet.

While the entries in your set appear to be ordered (until the set reaches a threshold size), this is an accidental consequence of the implementation, not a property that you can (or should) rely on. (It is not exactly a "coincidence" ... because the behavior is not random.)

The HashSet API makes no guarantees of ordering. If you want a set that is guaranteed to maintain an order use:

  • TreeSet for a set where the entries are sorted based on Comparable or Comparator semantics, or
  • LinkedHashSet for a set where the (temporal) order of insertion is preserved.
Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
1

HashSet has no ordering. Iteration order happens to depends on the hashCode() of the elements and how they are mapped into bins of the HashSet, but that's an implementation detail.

If you add the elements in sorted order and want to keep them sorted (i.e. be able to iterate over them in sorted order), use LinkedHashSet, since it maintains insertion order.

Your elements are Integer, and the hashCode() of an Integer is simply the int value of that Integer.

HashMap (which is used by HashSet), takes the hashCode() of the element (or key) and performs the following transformation on it to compute the bin index:

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

For hashCodes smaller than 2^16 (i.e. smaller than 65536), the expression (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); has the same value as key.hashCode().

Therefore each of the first 65535 elements are put in a bin whose index is hashCode() modulu (the number of bins), and since HashSet and HashMap maintain a load ratio < 1 (the default is 0.75), the number of bins is higher than the number of elements. This means that each Integer element i which is smaller than 65536 is stored in bin i.

When you iterate over the elements of the HashSet, they are returned to you according to the indices of the bins (first the elements of bin 0, then the elements of bin 1, etc...). The bin 0 contains value 0, bin 1 contains value 1, etc..., the first 65535 elements appear sorted.

Once you add 65536, the "ordering" breaks.

Eran
  • 387,369
  • 54
  • 702
  • 768