7
Set<Integer> s = new HashSet<Integer>();
s.add(77);
s.add(0);
s.add(1);
 
System.out.println(s);
 
TreeSet<Integer> s1 = new TreeSet<Integer>();
s1.add(77);
s1.add(0);
s1.add(1);
 
System.out.println(s1);

Output:

s = [0, 1, 77]
s1= [0, 1, 77]

By definition on the tutorials point page

A Set is a generic set of values with no duplicate elements. A TreeSet is a set where the elements are sorted.

Why is the output for both s and s1 sorted? I was expecting only s1's output to be sorted.

user
  • 7,435
  • 3
  • 14
  • 44
san
  • 263
  • 2
  • 7
  • 14
  • 3
    Just chance. Add a few more numbers and the `HashSet` will be unordered. – Augusto Oct 11 '20 at 18:14
  • 2
    `s1` is sorted because it's a TreeSet, `s` *looks* like it's sorted, but add a few more elements and you'll see it gets weird because of modulo. [How do HashSets in Java work?](https://stackoverflow.com/questions/9119840/how-do-hashsets-in-java-work) – user Oct 11 '20 at 18:15

4 Answers4

6

You are right that s1 is sorted because it's a TreeSet, but s isn't really sorted. If you add a few more elements to s, you can see something weird happening.

After adding 32, 12, 13, and 14
[0, 32, 1, 12, 77, 13, 14]

So now you see that it's somewhat ordered, but not really. The reason for that is that HashSet has a capacity of 16 by default, and it will grow later as the need arises. So when you add an element to it, picture it taking the hashcode of that element, modulo 16 so that it fits inside the internal "list" of 16 buckets. Since the hashcode of an Integer is the int it represents, we can pretend all it does is do (the element to be added) % 16.

So when you added, 0, 1, and 77, it probably looked something like this internally:

Bucket   Elements
0        0
1        1
2
3
4
5
...
13       77 (77 % 16 is 13, so it's placed here)
14
15
16

Then we add 32. 32 % 16 is 0, but we already have 0 in the first bucket. Luckily, to prevent collisions such as this one, HashSets use linked lists instead of single elements, so we just append 32 to our linked list containing 0.

Bucket   Elements
0        0 -> 32
1        1  
2
3
4
5
...
13       77
14
15
16

It works the same way for 12, 13 and 14 too:

Bucket   Elements
0        0 -> 32
1        1
2
3
4
5
...
12       12
13       77 -> 13
14       14
15
16

And if you go through the buckets in order, printing each linked list in that bucket, you get [0, 32, 1, 12, 77, 13, 14]

See it run

user
  • 7,435
  • 3
  • 14
  • 44
3

This happens only by chance.

HashSets are a special implementation of HashMap but they still use the hashCode to place object in buckets.

The standard hashCode for an Integer is the int value itself.

Specifying such low values coupled with the load factor and bucket algorithm causes them to be placed in different buckets based on that code but the buckets happen to be sequential. If you change values to something larger they will not be ordered because the algorithm only uses part of the hashCode to select the bucket, so the chance of them being sequential is reduced. This would also be the case for much larger sets of randomly distributed numbers.

Set<Integer> s = new HashSet<Integer>();

s.add(57999999);
s.add(67999999);
s.add(77999999);

System.out.println(s);

TreeSet<Integer> s1 = new TreeSet<Integer>();

s1.add(57999999);
s1.add(67999999);
s1.add(77999999);


System.out.println(s1);

On my machine runing Windows and Java 14, they print as follows:

[67999999, 77999999, 57999999]
[57999999, 67999999, 77999999]
WJS
  • 36,363
  • 4
  • 24
  • 39
  • 1
    Alternatively: `Set s = new HashSet<>(4); s.add(77); s.add(0); s.add(1); System.out.println(s);` prints `[0, 77, 1]`. – Holger Oct 12 '20 at 12:42
  • I thought I could force a similar behavior setting the load factor. I didn't consider the capacity. So when I did that I arbitrarily set the capacity to 16. – WJS Oct 12 '20 at 13:53
  • 3
    The key point are the lowest three bits, `0 → 000`, `1 → 001`, `77 → 101`. This implies that with a capacity of eight or higher, there are no collisions and these specific three values will happen to appear in their natural order. The only thing, the load factor can do, is influence whether insertions will make the capacity even bigger. With a capacity of two or four, you’ll have a collision on `1` and `77` (lowest bits `01`) and these two items will appear in their insertion order. Use `new HashSet<>(1, 3f)`, to force a capacity of one, and all three elements will appear in insertion order… – Holger Oct 12 '20 at 14:45
1

TreeSet implements SortedSet, that's why all elements that you've added to it will be sorted before actually adding.

In your case you're working with Integer. It's implements Comparable, it has default compare mechanism. And you TreeSet using this Integer's Comparable implementation to compare your Integers before adding them to collection.

So, in general when working with TreeSet you have two possible solutions:

  1. Your TreeSet should work only with objects that implements Comparable interface, because they need to have compareTo implementation. This implementation will be used by TreeSet to perform sorting.
  2. If your class isn't implementing Comparable you can pass to constructor your Comparator implementation when creating an instance of a TreeSet.

HashSet do not perform any sorting inside, it's just a random here. Elements goes to different buckets based on their hashcodes and in your case they matched up so.

UPD: HashSet it's not a random, my answer may be confusing, take a look at this explanation of some HashSet principles if interested.

maveriq
  • 456
  • 4
  • 14
1

Why is the Set interface returning an ordered list?

Set Interface in Java, or a Set concept, in general, can be thought as an Abstract Data Type, as a contract of the behaviour specification, which declares a collection that contains no duplicate elements.

So, Set<E> interface, itself, does not return anything; however, it can be implemented in different ways.

TreeSet<E> definition says, that:

The elements are ordered using their natural ordering, or by a Comparator provided at set creation time, depending on which constructor is used.

HashSet<E> definition, on the other hand, says, that:

it does not guarantee that the order will remain constant over time.

Giorgi Tsiklauri
  • 9,715
  • 8
  • 45
  • 66