0

I'm not even sure if this is possible but I'm doing a Set operation like union or intersection and I need to convert that to a List in order to shuffle the list and pass that to a different method that accepts a List instead of a Set. So I convert the result to a List and all is good. But then from the profiler, I see that the operation is taking a long time under load and it's because of the way Guava Sets .size() does. It's not a constant operation like normal java Set.

Here's the code example

    @Test
    void testSet() {
        Set<Character> first = ImmutableSet.of('a', 'b', 'c');
        Set<Character> second = ImmutableSet.of('b', 'c', 'd');

        Set<Character> union = Sets.union(first, second);

        List<Character> characters = new ArrayList<>(union);
    }

I'm trying to find the fastest way to convert a Guava Sets to a List. From digging through the code this is what Guava Sets does https://github.com/google/guava/blob/master/guava/src/com/google/common/collect/Sets.java#L694. It's not a constant operation and it hurts the performance under high load. I'm guessing that .size call is from when Java wants to copy a new Collection to a new List it has to know the size to create a List.

toy
  • 11,711
  • 24
  • 93
  • 176
  • 2
    _"taking a long time under load "_ - How do you know? What do you mean by _"a long time"_? What do you mean by _"under load"_? These are all very vague metrics and you might not be interpreting your profiler's output correctly. That said, I looked at the implementation of `Set#union()` and all I can say is ... wow, disappointing. – Jim Garrison Mar 05 '21 at 05:20
  • Also relevant: https://stackoverflow.com/questions/30373758/guava-sets-intersection-bad-performance – Grzegorz Rożniecki Mar 05 '21 at 08:40
  • 1
    `ImmutableSet.builder().addAll(first).addAll(second).build().asList()` – Louis Wasserman Mar 06 '21 at 19:48

1 Answers1

1

Putting "performance under high load" argument aside (I suggest doing proper JMH microbenchmarks if it's really relevant in your use case), Sets operations are memory-optimized, so if you're copying the data right away, you may want to try different approaches which don't call size at all.

First, Sets.union returns SetView<E> which has immutableCopy(), on which then you can call .asList() view, returning an immutable list (feel free to chain all operations together):

@Test
public void testSetCopy() {
    Set<Character> first = ImmutableSet.of('a', 'b', 'c');
    Set<Character> second = ImmutableSet.of('b', 'c', 'd');

    Sets.SetView<Character> union = Sets.union(first, second);
    List<Character> characters = union.immutableCopy().asList();

    assertThat(characters).containsOnly('a', 'b', 'c', 'd');
}

Second, you can also consider using Set in the first place, as mentioned by Louis:

@Test
public void testMultiset() {
    Set<Character> first = ImmutableSet.of('a', 'b', 'c');
    Set<Character> second = ImmutableSet.of('b', 'c', 'd');

    // here it's ugly but maybe you can collect to set in the first place
    ImmutableMultiset<Character> set = ImmutableSet.<Character>builder()
             .addAll(first)
             .addAll(second)
             .build(); // [a, b, c, d]

    List<Character> characters = set.asList();

    assertThat(characters).containsOnly('a', 'b', 'c', 'd');
}

That said, YMMV, and again I encourage you to do microbenchmarks before choosing any less readable option.

Grzegorz Rożniecki
  • 27,415
  • 11
  • 90
  • 112