0

using this method I created, I don't understand why it returns me a list of integers already in order when on the java doc the set shouldn't order the elements without me telling it.

what this method do: the method given a list of strings as input, returns the set of lengths of the strings in the list

public static Set<Integer> test(List<String> inputList){
    return inputList.stream()
        .map(String::length)
        .collect(Collectors.toSet());
}

doing this on the main

List<String> p2 = List.of("aaaa", "aa", "aaa", "a");
Set<Integer> r2 = test(p2);
System.out.println(r2);

it returns this list : [1, 2, 3, 4] when I would have expected [4, 2, 3, 1]

mansto
  • 43
  • 4
  • 3
    Does this answer your question? [How is this HashSet producing sorted output?](https://stackoverflow.com/questions/18648521/how-is-this-hashset-producing-sorted-output) – teapot418 Aug 25 '23 at 09:01
  • 1
    If you add a `System.out.println(r2.getClass());` you'll see that you are getting a HashSet. Why that appears sorted for small ints, see linked question. Add a bunch of long strings and the order will disappear. – teapot418 Aug 25 '23 at 09:02
  • 1
    To summarize linked question - this is only a coincidence, because integers hash to themselves. To see different ordering, collect like this (for example) - `.collect(Collectors.toCollection(() -> new HashSet<>(3, 3)));`, prints `[4, 1, 2, 3]` on my machine. – Chaosfire Aug 25 '23 at 09:14
  • Coming soon to a JDK near you: [*JEP 431: Sequenced Collections*](https://openjdk.org/jeps/431) – Basil Bourque Aug 25 '23 at 13:40

3 Answers3

2

Sets generally don't have any order you can rely on.

The purpose of a Set is to have a collection of elements that does not allow duplicates without considering the order of the elements. You can essentially add, remove elements and check whether an element is present.

If you want a data structure that is indexed, you should probably use a List. Lists are indexed and have an order.

public static List<Integer> test(List<String> inputList){
    return inputList.stream()
        .map(String::length)
        .collect(Collectors.toList());
}

However, there are specific Set implementations that gurantee a specific order.

For example, if you want a Set that just preserves order, you might want to consider using a LinkedHashSet which is essentially a HashSet that keeps insertion order:

public static Set<Integer> test(List<String> inputList){
    return inputList.stream()
        .map(String::length)
        .collect(Collectors.toCollection(LinkedHashSet::new));
}

If you want the set to be sorted in some way, you can also uses a sorted set implementation like TreeSet:

public static Set<Integer> test(List<String> inputList){
    return inputList.stream()
        .map(String::length)
        .collect(Collectors.toCollection(()->new TreeSet<>(Comparator.naturalOrder().reversed())));
}

In the above code, Comparator.naturalOrder().reversed() creates a Comparator which is used for configuring the order of elements the Set should be sorted in. Comparator.naturalOrder() refers to a Comparator which compares by the "natural order" of the elements which is ascending for Integers. reversed() is then used for reversing the order (descending Integers).

dan1st
  • 12,568
  • 8
  • 34
  • 67
1

The fact that your set is in sorted order is purely coincidental and based on the hash values, the capacity of the set, and the load factor. All of those are mentioned in the JavaDoc for HashSet but explained in more detail HashMap.

Consider the following:

List<Integer> list = List.of(13,12,11,10,9,8,7,6,5,4,3,2,1);
Set<Integer> set = new HashSet<>();
set.addAll(list);
System.out.println(set);

Prints

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]

Now adjust the size and load capacity to force a different behavior.

set = new HashSet<>(1,10f);
set.addAll(list);
System.out.println(set);

prints

[12, 10, 8, 6, 4, 2, 13, 11, 9, 7, 5, 3, 1]

And if you want to preserve the insertion order use a LinkedHashSet

set = new LinkedHashSet<>();
set.addAll(list);
System.out.println(set);

prints

[13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]

The bottom line is that except for some special implementations including TreeSet which wasn't discussed above, Sets are inherently unordered and any appearance that they are is purely coincidental.

WJS
  • 36,363
  • 4
  • 24
  • 39
0

The Answers by dan1st and by WJS are both correct and smart. I'll modify the code seen in the first to take advantage of new features in Java.

tl;dr

Use the new SequencedCollection interface in Java 21+.

public SequencedCollection < Integer > calculateStringLengths ( final List < String > inputList )
{
    return inputList
            .stream ( )
            .map ( ( String s ) -> Math.toIntExact ( s.codePoints ( ).count ( ) ) )  // Calculate the true count of characters in this string.
            .toList ( );
}

Sequenced collections in Java 21+

Java 21 brings some new interfaces and methods to the venerable Java Collections framework. The new features recognize the concept of sequenced collections.

diagram of three new interfaces added to Java 21

For more info, see JEP 431: Sequenced Collections. And see the video by José Paumard on the JEP Café channel on YouTube. See early-access Javadoc.

Let's modify the code shown in the Answer by dan1st to use the new Java 21 features.

Start with this piece:

public static Set<Integer> test(List<String> inputList){
    return inputList.stream()
        .map(String::length)
        .collect(Collectors.toCollection(LinkedHashSet::new));
}

Notice that this method must return either:

  • The overly general Set interface, with no indication of this collection promising to be in a certain order.
  • The overly specific LinkedHashSet class, which remembers insertion order.

Java offers the SortedSet interface, and its successor NavigableSet. These are meant for collections with a sorted order. But LinkedHashSet with its insertion order cannot implement those interfaces.

So, voilà, Java 21 adds the SequencedSet interface, to represent LinkedHashSet as well as SortedSet/NavigableSet implementations such as TreeSet.

Let's pop that into the code snippet from above. And do some reformatting and renaming.

public SequencedSet < Integer > calculateStringLengths ( List < String > inputList )
{
    return inputList
            .stream ( )
            .map ( String :: length )
            .collect (
                    Collectors.toCollection ( LinkedHashSet :: new )
            );
}

And we could make that the more general SequencedCollection.

public SequencedCollection < Integer > calculateStringLengths ( List < String > inputList )
{
    return inputList
            .stream ( )
            .map ( String :: length )
            .collect (
                    Collectors.toCollection ( LinkedHashSet :: new )
            );
}

Le's try it.

List < String > listOfStrings = List.of ( "aaaa" , "aa" , "aaa" , "a" );
SequencedCollection < Integer > stringLengths = this.calculateStringLengths ( listOfStrings );
System.out.println ( "stringLengths = " + stringLengths );

When run:

stringLengths = [4, 2, 3, 1]

Set forbids duplicates

But multiple input strings could very well have the same length. And Set implementations cannot allow duplicates. So really, we should be outputting a List rather than a Set.

Fortunately, no need to change the signature of our calculateStringLengths method — List & ArrayList implement SequencedCollection. So we can swap:

            .collect (
                    Collectors.toCollection ( LinkedHashSet :: new )
            );

… with:

            .toList() ;

… like this:

public SequencedCollection < Integer > calculateStringLengths ( List < String > inputList )
{
    return inputList
            .stream ( )
            .map ( String :: length )
            .toList() ;
}

We get the same results.

stringLengths = [4, 2, 3, 1]

Use code points, not char

Unfortunately, you used the String#length method which is essentially broken. Add a character such as to your input strings to see incorrect results.

That length method uses the char type. As a 16-bit value, that char type is physically incapable of representing most characters. Instead of char use code points.

Replace String :: length with s.codePoints ( ).count ( ). But that call returns a long, so we must either change < Integer > to < Long > or we must cast the long to an int since a string’s length cannot be more than Integer.MAX_VALUE. For defensive programming, we can be extra safe by using Math.toIntExact instead of a direct cast, as this method throws an exception in case of integer overflow.

public SequencedCollection < Integer > calculateStringLengths ( final List < String > inputList )
{
    return inputList
            .stream ( )
            .map ( ( String s ) -> Math.toIntExact ( s.codePoints ( ).count ( ) ) )  // Calculate the true count of characters in this string.
            .toList ( );
}

Let's try this final version of our code. We add as a test of our more accurate string-length calculation.

List < String > listOfStrings = List.of ( "aaaa" , "aa" , "aaa" , "a" , "" );
SequencedCollection < Integer > stringLengths = this.calculateStringLengths ( listOfStrings );
System.out.println ( "stringLengths = " + stringLengths );

stringLengths = [4, 2, 3, 1, 1]

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154