-4

PS: How is this HashSet producing sorted output? this post doesn't answer my question. I know that if I put any numbers into hashset, I will not get sorted order.

However, I found that if I put all [1, 2, 3, ..., n] into a HashSet with any shuffled order and iterate the HashSet, I will get a guranteed sorted order. I cannot understand why it will always happen. I've tested any n < 10000 for many times, it's always true, therefore it should not be a coincidence and it should have some reason! Even though I should not rely on this implement details, please tell me why it always happens.

PS: I know that if I insert [0,1,2, ..., n-1], or [1+k, 2+k, .., n+k] (k != 0) into HashSet, the iteration order is unsorted and I've tested. It's normal that iteration order of HashSet is unsorted. However, why any insertion order of [1,2,3,4,..,n] is accidentally always true? I've checked the implementation details. If I track the path, the whole process will inculde the resizing the bucket array, and transformation from linkedlist to red-black tree. If I insert the whole [1-n] in shuffled order, the intermediate status of the HashSet is unsorted. However it will accidentally have sorted order, if I complete the all insertions.

I used the JDK 1.8 to do the following test.

public class Test {

    public static void main(String[] args) throws IOException {
        List<Integer> res = printUnsortedCase(10000);
        System.out.println(res);
    }


    private static List<Integer> printUnsortedCase(int n){
        List<Integer> res = new ArrayList<>();
        for (int i = 2; i < n; i++) {
            if (!checkSize(i)) {
                res.add(i);
            }
        }
        return res;
    }

    private static boolean checkSize(int n) {
        List<Integer> list = new ArrayList<>();
        for (int i = 0; i < n; i++) {
            list.add(i);
        }
 
        // here I've shuffled the list of [1,2,3,4, ...n]        
        Collections.shuffle(list);

        Set<Integer> set = new HashSet<>();
        for (int i = 0; i < n; i++) {
            set.add(list.get(i)); // I insert the set in an unsorted order of [1,2,3,..,n]
        }

        list = new ArrayList<>(set);// iterate over the HashSet and insert into ArrayList
        return isSorted(list);
    }

    private static boolean isSorted(List<Integer> list) {
        for (int i = 1; i < list.size(); i++) {
            if (list.get(i - 1) > list.get(i)) return false;
        }
        return true;
    }
}

I've wrote the above checking code and it seems true.

maplemaple
  • 1,297
  • 6
  • 24
  • Why do you expect a different order every time? The set will build a data structure based on the elements you put it. Put in the same elements, get back the same structure. From the same structure, get the same iteration order. – Thilo Mar 28 '21 at 08:13
  • @Thilo I shuffled the ArrayList of [1, 2, 3, 4, ...] – maplemaple Mar 28 '21 at 08:14
  • For 10k users: this question is a repost of https://stackoverflow.com/questions/66839418/why-will-i-get-a-guaranteed-sorted-iterator-of-hashset-if-i-put-all-1-to-n-into (with a smaller `n` upper bound) – Tom Mar 28 '21 at 08:15
  • Yes, but the insertion order does not matter for the set. As long as you put in the same numbers, it still creates the same set (mathematically, and also data-structure wise). – Thilo Mar 28 '21 at 08:15
  • 1
    Have you actually read [How is this HashSet producing sorted output?](https://stackoverflow.com/q/18648521/3890632) which was linked last time you posted this question? – khelwood Mar 28 '21 at 08:16
  • @Thilo Therefore why iteration of set of any insertion order of [1, 2, ..n] is sorted? – maplemaple Mar 28 '21 at 08:17
  • 5
    Insertion order doesn't matter, it's the numbers hashing to themselves. It's an implementation detail you can't rely on. – ggorlen Mar 28 '21 at 08:18
  • @ggorlen No. It's totally a different question – maplemaple Mar 28 '21 at 08:19
  • Okay, that is interesting and unexpected. But don't depend on this. – Thilo Mar 28 '21 at 08:19
  • @ggorlen If insert {1,5,3}, the set is unsorted. But if you insert {1, 5,3,2,4}, then it will always sorted. I don't know why – maplemaple Mar 28 '21 at 08:20
  • 3
    Probably because `{1, 5, 3}` isn't sequential, but it's basically inconsequential and can't be relied upon. Java says "sets aren't ordered", so when you put stuff in and they _happen to appear ordered_, what value is there in trying to reason about why? The internals of the hash logic are such that sometimes things are ordered. The other question covers this in detail. – ggorlen Mar 28 '21 at 08:21
  • @ggorlen No matter what shuffled order of [1,2,3,4,5], you can try. – maplemaple Mar 28 '21 at 08:21
  • What JDK version are you using? – Thilo Mar 28 '21 at 08:21
  • @Thilo I'm using JDK 1.8 – maplemaple Mar 28 '21 at 08:22
  • 2
    This happens because all these small numbers just hash to their exact bucket array index. Add 10000 to each number and it won't be sorted anymore. For `{1,5,3}` there were probably only 4 buckets, so 5 cannot go into bucket 5. – Thilo Mar 28 '21 at 08:28
  • @Thilo I've tested that [0, 1,2, n-1], or [1+k, 2+k, .., n+k] (k != 0) is not true. Why [1,2,3,4,..,n] is always true? – maplemaple Mar 28 '21 at 08:30
  • 1
    In my jdk 1,5,3 is sorted. So if you are curious why so in your case then you can check HashSet implementation. Refer https://stackoverflow.com/questions/29258944/where-is-the-src-zip-for-jdk8u40 – the Hutt Mar 28 '21 at 08:30
  • The iteration order returned by HashSet is the iteration order over its internal buckets. In general, that is not related to anything directly, but you have found an edge-case where the value of the element is the same as its hashcode and the hashcode is also small enough to be mapped directly into an array index that also matches the value of the element. Dive it what you get with a debugger – Thilo Mar 28 '21 at 08:32
  • @onkarruikar I've checked. If I track the path, the whole process will inculde the resizing the bucket array, and transform the linkedlist to red-black tree. However finally it accidentally has sorted order. – maplemaple Mar 28 '21 at 08:33
  • 1
    it is sorted but **not guaranteed** maybe in some future java version it is changed (probably not, but...) ( "I cannot guarantee that it will rain" - is not a guarantee that it will not rain) –  Mar 28 '21 at 08:39
  • 4
    _However finally it accidentally has sorted order_. The specification doesn't say "the iteration order will not ever be sorted". It says, effectively, "don't rely on it being sorted". You found an input pattern that happens to reliably produce sorted results. Big deal--nobody has said such a pattern doesn't exist. What are you trying to prove here? – ggorlen Mar 28 '21 at 08:40
  • 1
    The fact you can consistently reproduce a certain behaviour does not mean it's *guaranteed*. The implementation could - theoretically - change between point-releases and change this entirely. – Mark Rotteveel Mar 28 '21 at 15:34
  • @ggorlen I certainly know it's not guaranteed and I should not relpy on this. However, as a STEM student, when I see some phenomenon always happens after some complicated intermediate process, I just want to know why it always happens at least for JDK1.8. – maplemaple Apr 26 '21 at 18:58

1 Answers1

5

You are conflating two related concepts:

  1. guaranteed order: the specification says that you will get the elements back in a specific order and all implementations conforming to that spec will do so.
  2. reproducible order: a specific implementation returns all the elements back in a specific order.

Guaranteed order necessarily implies reproducible order (otherwise you'd have a bug).

Reproducible order doesn't imply guaranteed order. It's possible that the reproducible order is just a side effect of some implementation details that happens to align so that you get the elements in the same order under some circumstances, but this isn't guaranteed.

In this specific case several factors together result in a reproducible order:

  • Integer has a highly reproducible and predictable hashCode (it's just the number itself)
  • HashMap does some minor manipulation on that hash code to decrease the chances of collisions by simple hash code implementations, which doesn't matter in this case (because it just does hash ^ (hash >>> 16) which keeps number <= 216 equally-sorted).
  • You use a very consistent and reproducible way to construct your HashMaps. The resulting hashmaps will always have gone through the same growing stages.

If instead of

        list.add(i);

you did

        list.add(i + 65000);

(i.e. use the number 65000 to 65000+n instead of 0 to n) then you'd see the non-sorted results emerge.

In fact the "reproducible order" that you get is so fragile that just adding 10 already causes some of the lists to be unsorted.

Joachim Sauer
  • 302,674
  • 57
  • 556
  • 614
  • I know that [0,1,2, ..., n-1], or [1+k, 2+k, .., n+k] (k != 0) is not true and I've tested. It's normal that iteration order of HashSet is unsorted. However, why any order of [1,2,3,4,..,n] is accidentally always true? I've checked the implementation details. If I track the path, the whole process will inculde the resizing the bucket array, and transform the linkedlist to red-black tree. However finally it accidentally has sorted order. – maplemaple Mar 28 '21 at 08:37
  • 3
    @maplemaple: I've not done a full analysis, but it's **possible** that with the current implementation this will hold true, but if it does, it's accidental: you can't rely on that being the case and that's the whole **point** of guaranteed order: yes, you *could* get them in some order, but you can't depend on it. Obviously you also can't depend on the opposite: you shouldn't treat the return order of a `Set` as a source of randomness because "not specified to return in a specific order" is very different from "specified to return in a random order". – Joachim Sauer Mar 28 '21 at 08:40
  • @maplemaple Did you read the part about the hashCode being the number itself for small numbers? Did you go over 2^16 ? – Thilo Mar 28 '21 at 08:40
  • 2
    @maplemaple: think of it this way: if the `hash(Object)` function in `HashMap` would flip some specific bits in the input with each other in a deterministic way, then the result of your code would still be "sorted" according to some hard-to-intuit-but reproducible way. I don't think you would have complained (or noticed) in that case. The only thing that's different here is that some artifact of the internal implementation is "obviously" visible to the human eye. There are **tons** of such artifacts, but most of those simply don't look obvious. – Joachim Sauer Mar 28 '21 at 08:45
  • 1
    @maplemaple: I hope you now also understand the strong push to close as duplicate of [this question](https://stackoverflow.com/questions/18648521/how-is-this-hashset-producing-sorted-output). Both that and your question ask "why does specific setup X produce output Y when the spec says Y is not guaranteed" and the only difference between the two questions is the specific setup X. – Joachim Sauer Mar 28 '21 at 08:57