5

So what I know is that a HashSet has no real sorting capabilities like a SortedSet, however I stumbled upon this :

When I run the following code :

 public static void main(String[] args) {
    Set<String> collection = new HashSet<String>(2000);
    String[] data = {"a", "c", "g", "f", "b", "f", "b", "d","q","r","d","m"};
    for(String input: data)
    {
        collection.add(input);
    }
    System.out.println("Output: " + collection);
}

I get the following output : Output: [a, b, c, d, f, g, m, q, r]

Which is alphabetically sorted. Why is that? Since a HashSet is not a sorted set.

So I tried with a string of characters instead of a single character :

public static void main(String[] args) {
    Set<String> collection = new HashSet<String>(2000);
    String[] data = {"atjre", "crj", "gertj", "fertj", "berj"};
    for(String input: data)
    {
        collection.add(input);
    }
    System.out.println("Output: " + collection);
}

And i get the following output : Output: [crj, atjre, fertj, gertj, berj]

Now they are not sorted anymore, any explanations for this? Or is this just a random coincidence?

  • HashSet is implemented based on hashcodes. – ifly6 Jun 11 '18 at 14:51
  • 1
    that is, because of the way the data is stored inside the hashSet (using `equals()` and `hashCode()`) That way, when comparing `'a'` with `'g'`, `'a'` is always smaller and so on, thus it is sorted – Lino Jun 11 '18 at 14:51
  • 1
    Possible duplicate of [How to sort a HashSet?](https://stackoverflow.com/questions/22391350/how-to-sort-a-hashset) – Nikolas Charalambidis Jun 11 '18 at 14:57

2 Answers2

4

HashSet implements Set interface. It means that there is no guarantee of order of elements.

This class implements the Set interface, backed by a hash table (actually a HashMap instance). It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time. Source

Over the time after you adding, deleting few times you can see the difference.

However, "no guarantee of ordering" does not imply "guaranteed random ordering". Exact answer of your question is,

The hashcode-method of the String class also comes into play here, for single character Strings the hashcode will just be the int value of the one char in the String. And since char's int values are ordered alphabetically, so will the computed hashes of single char Strings.

0

As per the Java docs: https://docs.oracle.com/javase/7/docs/api/java/util/HashSet.html

It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time.

What I think you are experiencing here is a hash-function distribution anomaly. The hash-function is used internally to give your strings an integer index. For 1-long strings there isn't much complexity. As you make your strings longer, your hash function has more to work with.

This stems back to the whole idea of a hash function: take a set of possible values, and map them as evenly as possible to a set of smaller values. It just so happens that the hash function has those strings mapped as it does. You would probably see the same thing with consecutive numbers. And you start to see them un-ordered once more data is introduced.

Ori Almog
  • 59
  • 4