2

When I run below code it is always giving the o/p in natural/ alphabetical order. As per I know HashSet doesn't sort the entries. I know that HashSet is backed by HashMap and not LinkedHashMap. I tried to explore the source code of HashSet and HashMap but couldn't find the code for this behavior.

From the source code there is below constructor in HashSet class:

HashSet(int initialCapacity, float loadFactor, boolean dummy) {
        map = new LinkedHashMap<>(initialCapacity, loadFactor);
    }

which uses LinkedHashMap. If I had used this constructor I would think this is the reason for this behavior but I'm not using this constructor.

Could someone please explain the reason/ code for this behavior?

Here's my simple code:

Set<String> mySet = new HashSet<>();

        mySet.add("D");
        mySet.add("B");
        mySet.add("1");
        mySet.add("E");
        mySet.add("A");
        mySet.add("F");

        mySet.stream().forEach(x -> System.out.println(x));

OP:

1
A
B
D
E
F
Shubhendu Pramanik
  • 2,711
  • 2
  • 13
  • 23
  • 4
    Coincidence, effectively - quite possibly linked to those hash codes being in order and within a small range. Try adding "BX" into the mix, for example... – Jon Skeet Feb 01 '18 at 10:42
  • The set order is affected by the hash code; the hash code is affected by the value; the values are in order; this can lead to the set elements being in order. – khelwood Feb 01 '18 at 10:44

5 Answers5

3

This is coincidence because the default HashSet is bigger than the range of the hashes and there are no collisions, and the hashes for the Strings end up being in alphabetical order.

This is the code for String.hashCode:

   public int hashCode() {
        int h = hash;
        if (h == 0) {
            int off = offset;
            char val[] = value;
            int len = count;

            for (int i = 0; i < len; i++) {
                h = 31*h + val[off++];
            }
            hash = h;
        }
        return h;
    }

As you can see, the hash for one-character string ends up just being the character value.

The default capacity of HashSet is 16, which means all your values end up in the bucket char value % 16 which turns out to be alphabetical order for your example. Try with "2" instead of "1", for example, this should end up after "A". Even if you swap "A" and "1" this should swap them in the output too. See Ascii table.

kutschkem
  • 7,826
  • 3
  • 21
  • 56
1

From Java 8 Docs

This class implements the Set interface, backed by a hash table (actually a HashMap instance). It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time.

So in other words, you cannot rely on the order of the elements in a HashSet.

Alexander
  • 53
  • 1
  • 9
0

Use the following code and you would see that the hashcode of the added elements are in a ascending order :

Set<String> mySet = new HashSet<>();

mySet.add("D");
mySet.add("B");
mySet.add("1");
mySet.add("E");
mySet.add("A");
mySet.add("F");

mySet.stream()
     .forEach(x -> System.out.println(x + " : " + x.hashCode()));

System.out.println(mySet);

1 : 49

A : 65

B : 66

D : 68

E : 69

F : 70

[1, A, B, D, E, F]

Here you used a very particular example : you added only Strings containing a single character (letter or number).
As hashcode of these correspond to their ASCII code you get a predictable order that respects the ASCCI order.

Distinct hashcode values ares physically represented by distinct elements of an array in the HashMap implementation :

transient Node<K,V>[] table;

And Iterator of HashMap iterates array elements index by index.
Whereas the result.

Now the ASCII order that the Map uses to iterate that looks like a natural order for numeric and alphabetical character is right only for very simple cases where added Strings are composed of only 1 letter or 1 digit.

Add Strings that contain more than one of character and you would have a unpredictable order :

Set<String> mySet = new HashSet<>();
mySet.add("Dad");
mySet.add("Mum");
mySet.add("15454");
mySet.add("90000");

mySet.stream()
     .forEach(x -> System.out.println(x + " : " + x.hashCode()));

System.out.println(mySet);

90000 : 54118329

Mum : 77733

15454 : 46883119

Dad : 68455

[90000, Mum, 15454, Dad]

davidxxx
  • 125,838
  • 23
  • 214
  • 215
  • don't think `hashcode` is relevant here. For input `D-B-1-E-BX-A-F` op is `1 : 49 A : 65 B : 66 D : 68 E : 69 BX : 2134 F : 70` and for `D-B-1-E-A-F-BX` op is `1 : 49 A : 65 B : 66 D : 68 E : 69 F : 70 BX : 2134` – Shubhendu Pramanik Feb 01 '18 at 11:03
  • @Shubhendu Pramanik It is relevant here because the OP added String with one character. I added to be more specific. – davidxxx Feb 01 '18 at 11:22
0

The hashCode of Strings of length 1 hash just the sole char, and its hash code is its own numeric value. Voilà, all is ordered.

This phenomenon partly also can be found for Strings with the same prefix, of the same length and is relevant for security exploits. (I believe MD5 needs an artifical seed.)

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
0

This is just a coincidence for your testing/working data set which is generating ordered hashes. I have added some more element to your set. Try to run the below code, I think you will get your answer.

Set<String> mySet = new HashSet<>();

mySet.add("D");
mySet.add("B");
mySet.add("1");
mySet.add("E");
mySet.add("A");
mySet.add("F");
mySet.add("C");
mySet.add("Z");
mySet.add("M");
mySet.add("Q");


mySet.stream().forEach(x -> System.out.println(x));

Here is my output (which is not in natural order) : 1 A Q B C D E F Z M

Amit Bera
  • 7,075
  • 1
  • 19
  • 42