-2

I'm using Strings (long sentences) with HashSet and I'm trying to shuffle them to get a random sentence every time the program runs but this is not happening

public class testshuffle {

    public static void main(String[] args) {
        for (int i = 0; i < 100; i++) {
            run();
        }
    }

    public static void run() {
        ArrayList<String> list = new ArrayList<>();
        Set<String> set = new HashSet<>();
        list.add("Alexandria And Mimy are good people");
        list.add("Bob And Alexandria are better than Mimy");
        list.add("Camelia And Johanness are better than Bob And Alexandria");

        shuffle(list, ThreadLocalRandom.current());
        set.addAll(list);
        System.out.println(set);
    }
}

I know HashSet order is not guaranteed. When using Integer or Double, the hashCode returned would likely cause the element to be sorted.

But here I'm using Strings and the output is:

[Alexandria And Mimy are good people, Bob And Alexandria are better than Mimy, Camelia And Johanness are better than Bob And Alexandria]
[Alexandria And Mimy are good people, Bob And Alexandria are better than Mimy, Camelia And Johanness are better than Bob And Alexandria]
[Alexandria And Mimy are good people, Bob And Alexandria are better than Mimy, Camelia And Johanness are better than Bob And Alexandria]
[Alexandria And Mimy are good people, Bob And Alexandria are better than Mimy, Camelia And Johanness are better than Bob And Alexandria]
[Alexandria And Mimy are good people, Bob And Alexandria are better than Mimy, Camelia And Johanness are better than Bob And Alexandria]
[Alexandria And Mimy are good people, Bob And Alexandria are better than Mimy, Camelia And Johanness are better than Bob And Alexandria]
[Alexandria And Mimy are good people, Bob And Alexandria are better than Mimy, Camelia And Johanness are better than Bob And Alexandria]
.
.
.
[Alexandria And Mimy are good people, Bob And Alexandria are better than Mimy, Camelia And Johanness are better than Bob And Alexandria]

Please don't mark this as duplicate because this is different from the cases I found here

Eltorrooo
  • 157
  • 2
  • 15
  • 1
    Why is this result unexpected? – khelwood Jan 31 '18 at 11:11
  • 2
    "Please don't mark this as duplicate because this is different from the cases I found here" - that doesn't mean that it *isn't* a duplicate, it just means either you haven't found the right question, or you haven't recognized that one you were looking at *is* a duplicate. – Jon Skeet Jan 31 '18 at 11:12
  • "I know HashSet order is not guaranteed" - doesn't that answer your question then? The HashSet can order things any way it likes. It looks like you're observing a stable order regardless of the order in which the values are added... which is entirely within the bounds of what the HashSet is allowed to do. Fundamentally, "shuffling" doesn't make sense for a collection where you don't control the order. – Jon Skeet Jan 31 '18 at 11:13
  • 2
    The fact that the order is not guaranteed doesn't mean that it isn't deterministic, or that it is random. You've shuffled your list. Print your list, not your set, and you'll have a random order. – JB Nizet Jan 31 '18 at 11:13
  • You seem to think HashSet has sorted the list alphabetically. No, it hasn't. It has sorted according to its own rules. They just happen by chance to have resulted in an alphabetical order in this case. You only have 3 items in the set. There's not many ways they can be ordered. – DodgyCodeException Jan 31 '18 at 11:26

3 Answers3

0

HashSet order is not guaranteed

This is not exactly true, what order? If native order (1<2, a < b), it's true. But when put into HashSet, it has its own order base on hashcode of elements, it means if all elements have unique hashcode, you run 1000 times, the orders are always the same!

If you change the code into this:

    list.add("Alexandria");
    list.add("Bob");
    list.add("Camelia");

The result is:

[Bob, Camelia, Alexandria]
[Bob, Camelia, Alexandria]
[Bob, Camelia, Alexandria]

You see? No Alphabet order!

yelliver
  • 5,648
  • 5
  • 34
  • 65
  • *if you run 1000 times, the orders are always the same*: No. It depends on the hashCodes and on the order of insertion. Use "AaAa", "BBBB" and "AaBB" in the OP's code, adn you'll have a different order every time. Why? Because these three strings have the same hashCode, and are thus stored in the same bucket, in a linked list, whose order depends on the insertion order. – JB Nizet Jan 31 '18 at 11:34
  • @JBNizet thank you! I don't notice the case of same hashcode – yelliver Jan 31 '18 at 11:43
0

HashSet uses calculated hashCodes to place these strings in a bucketed fashion.

According to the String hashCode() contract, two equal strings will have the same hash code within the same JVM. That means that the hashcode will not change as long as the string does not change.

Having said that, the actual hashCode() implementation has changed from one JVM version to another and/or from one JVM vendor to another. Hence do not fully rely on it even though it is seemingly behaving in a predictable fashion in your case.

String hashCode() JavaDoc:

/** * Returns a hash code for this string. The hash code for a * {@code String} object is computed as * <blockquote><pre> * s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] * </pre></blockquote> * using {@code int} arithmetic, where {@code s[i]} is the * <i>i</i>th character of the string, {@code n} is the length of * the string, and {@code ^} indicates exponentiation. * (The hash value of the empty string is zero.) * * @return a hash code value for this object. */

gargkshitiz
  • 2,130
  • 17
  • 19
0

This is in complement to the other answers and comments, but it seems the OP still doesn't understand, so I'll try to give an example.

The structure of a HashSet is an array of buckets. A bucket contains 0, 1, or several elements of the set. If more than 1 element are in a bucket, then they are stored, inside that bucket, in a linked list.

(Note, this is a simplification: HashSet is more complex than that and can start using trees in certain conditions).

When adding an element to a HashSet, the bucket to store that element is chosen, in a deterministic way, based on the hashCode of the element.

So, imagine the HashSet has 7 buckets b1 to b7.

Imagine that you add 3 elements A, B and C to the HashSet.

Imagine that the deterministic function used to choose the buckets returns

  • b1 for A
  • b2 for B
  • b3 for C

You will thus have a structure like

 [
   b1 -> A,
   b2 -> B,
   b3 -> C,
   b4 -> <empty>
   b5 -> <empty>
   b6 -> <empty>
   b7 -> <empty>
 ]

When iterating, the HashSet will not iterate randomly. It will simply go from bucket to bucket, and always print A, then B, then C. Since the function to choose the bucket is deterministic, A, B and C will always be stored respectively in b1, b2 and b3, whatever the insertion order is.

That's why you always get the same order.

Now, suppose that A, B and C have the same hashCode. Or at least, that the result of function used to find the bucket, based on the hashCode, for A, B and C returns the same bucket for A, B and C: b3.

If you insert A, then B, then C, you will end up with

 [
   b1 -> <empty>,
   b2 -> <empty>,
   b3 -> A -> B -> C
   b4 -> <empty>
   b5 -> <empty>
   b6 -> <empty>
   b7 -> <empty>
 ]

But if you insert C, then B, then A, you'll end up with

 [
   b1 -> <empty>,
   b2 -> <empty>,
   b3 -> C -> B -> A
   b4 -> <empty>
   b5 -> <empty>
   b6 -> <empty>
   b7 -> <empty>
 ]

And when iterating over the HashSet, the order will thus be different, depending on the insertion order.

TL;DR: the HashSet is free to order its elements the way it wants to, and you should thus not rely on the order of elements in a HashSet. Just use your List directly, since it's shuffled, and provides an ordering guarantee.

JB Nizet
  • 678,734
  • 91
  • 1,224
  • 1,255