Java 8 stream unique Integers

Question

Can i reduce the following code to one/two line?

DTO dto;
List<DTO> dtos;
List<Integer> list1 = dtos.stream().map(DTO::getFirstId).distinct().collect(Collectors.toList());
List<Integer> list2 = dtos.stream().map(DTO::getSecondId).distinct().collect(Collectors.toList());

List<Integer> reducedId = list1.stream().filter(list2::contains).collect(Collectors.toList());

Possible, probably. I'm not sure that you should though, unless you want to trade readability for a single line of code. Which you shouldn't. — FrederikVH, Feb 24 '17 at 13:56

score 10 · Answer 1 · answered Feb 24 '17 at 13:58

Using a single Java 8 stream is not a great choice here. Instead you should first create a Set so that you can perform an efficient contains test.

Set<Integer> secondIds = dtos.stream().map(DTO::getSecondId).collect(Collectors.toSet());
List<Integer> reducedIds = dtos.stream().map(DTO::getFirstId).distinct()
        .filter(secondIds::contains).collect(Collectors.toList());

score 4 · Answer 2 · answered Feb 24 '17 at 17:39

You may force it into one stream operation, but the performance would be even worse than what you have now, i.e. an operation with quadratic time complexity.

A better approach would be:

Set<Integer> set = dtos.stream().map(DTO::getSecondId).collect(Collectors.toSet());
List<Integer> result = dtos.stream().map(DTO::getFirstId)
    .distinct().filter(set::contains).collect(Collectors.toList());
// result now contains the common IDs

By collecting the second IDs into a Set instead of a List, you don’t need to use distinct() in the first stream operation and avoid the linear search applied to every element in the second stream operation when contains is invoked.

You may generally consider using a Set for remembering unique IDs. When using a Set as result type, you may avoid the distinct() of the second stream operation too:

Set<Integer> set = dtos.stream().map(DTO::getSecondId).collect(Collectors.toSet());
Set<Integer> result = dtos.stream().map(DTO::getFirstId)
    .filter(set::contains).collect(Collectors.toSet());

If you suspect lots of duplicate IDs and want to keep the behavior of sorting out duplicates before checking the other Set, you may use:

Set<Integer> set = dtos.stream().map(DTO::getSecondId).collect(Collectors.toSet());
Set<Integer> result = dtos.stream().map(DTO::getFirstId)
    .collect(Collectors.toCollection(HashSet::new));
result.retainAll(set);

Note that if you prefer long, hard too read “one liner”, you can inline the set variable in all variants.

score 3 · Answer 3 · edited Mar 31 '17 at 15:37

If you're open to using a third-party library, you have a few options using Eclipse Collections. If you want to use Stream, the following should work using Collectors2.

MutableList<Integer> result =
    dtos.stream().map(DTO::getFirstId).collect(Collectors2.toSet())
        .intersect(dtos.stream().map(DTO::getSecondId).collect(Collectors2.toSet()))
        .toList();

Eclipse Collections also has its only LazyIterable type that can be used instead of Stream.

LazyIterable<DTO> iterable = LazyIterate.adapt(dtos);
MutableList<Integer> result =
    iterable.collect(DTO::getFirstId).toSet()
        .intersect(iterable.collect(DTO::getSecondId).toSet())
        .toList();

Finally, if you have large numbers of ids, you might want to use primitive sets to avoid boxing Integer objects. You can either work with Stream and Collectors2 again as follows:

IntSet second = dtos.stream().collect(
    Collectors2.collectInt(DTO::getSecondId, IntSets.mutable::empty));
IntList result = dtos.stream().collect(
    Collectors2.collectInt(DTO::getFirstId, IntSets.mutable::empty))
        .select(second::contains).toList();

Or you can use LazyIterable as follows:

LazyIterable<DTO> iterable = LazyIterate.adapt(dtos);
IntList result =
    iterable.collectInt(DTO::getFirstId).toSet()
        .select(iterable.collectInt(DTO::getSecondId).toSet()::contains)
        .toList();

Note: I am a committer for Eclipse Collections.

user_3380739 · Answer 4 · 2023-01-08T04:12:04.040

With StreamEx:

Set<Integer> set = StreamEx.of(dtos).map(DTO::getSecondId).toSet();
List<Integer> result = StreamEx.of(dtos).map(DTO::getFirstId)
                                        .filter(set::contains).distinct().toList();

Or by abacus-common

Set<Integer> set = Stream.of(dtos).map(DTO::getSecondId).toSet();
List<Integer> result = Stream.of(dtos).map(DTO::getFirstId)
                                        .filter(set::contains).distinct().toList();

// OR:

List<Integer> result = Stream.of(dtos).map(DTO::getFirstId)
       .intersection(Stream.of(dtos).map(DTO::getSecondId).toSet()).toList();

score 1 · Answer 5 · answered Feb 24 '17 at 13:55

1

I think you could do something like

List<Integer> reducedId = dtos.stream().map(DTO::getFirstId).distinct().filter(
    (dtos.stream().map(DTO::getSecondId).distinct().collect(Collectors.toList()))::contains
).collect(Collectors.toList());

Not tested in my local, but it seems reasonable to me :)

answered Feb 24 '17 at 13:55

dylaniato

516
9
23

you're right Patrick, I think your solution it's the best possible one. – dylaniato Feb 24 '17 at 14:05
2

@Patrick Parker: no, the second IDs are not rebuilt every time. When replacing the inner `Collectors.toList()` with `Collectors.toSet()`, it will do exactly the same as your solution, just less readable. When you have a method reference of the form `expression::name`, the expression will be evaluated only once and the result captured for the function, see also [here](http://stackoverflow.com/a/28025717/2711488), just like with `for(Type var: expression) …`, the `expression` will only be evaluated once, not in every iteration – Holger Feb 24 '17 at 17:44
@Holger ah, good catch. I just glanced quickly and saw a stream() within a stream(). This answer is a bit cleverer than I first imagined. thanks – Patrick Parker Feb 24 '17 at 17:56
@Holger: It does. the second Ids will be rebuilt every time. it's can be verified by adding peek: ...dtos.stream().map(DTO::getSecondId).distinct().peek(System.out::println)... – user_3380739 Feb 24 '17 at 18:51
1

@user3380739: maybe you have a different understanding of “every time”. – Holger Feb 24 '17 at 18:56
@Holger: I think we are on same page: "every time" = "every iteration". please refer to below sample: the second code is much faster the first one: final List list1 = IntList.range(0, 10000).toList(); IntList.range(1, 1000).toList().stream().filter((list1.stream().collect(Collectors.toSet()))::contains).collect(Collectors.toList()); final Set set = list1.stream().collect(Collectors.toSet()); IntList.range(1, 1000).toList().stream().filter(set::contains).collect(Collectors.toList()); – user_3380739 Feb 24 '17 at 19:02
1

@user3380739: interesting, so you don’t suggest to use `peek` anymore, but to repeat your flawed “benchmark”? Just swap the order of the two… – Holger Feb 24 '17 at 19:54
1

@user3380739: by the way, `IntList.range(1, 1000).toList().stream()` doesn’t look like an advantage over `IntStream.range(1, 1000).boxed()` to me. Rather like a case of 3rd-party-library overdose… – Holger Feb 24 '17 at 19:57
@Holger, peek is the very first/simple step to verify it. 'benchmark' is a double confirmation. I has the utility class to do it in one line in two seconds. :-): final List list1 = IntList.range(1, 100000).toList(); Profiler.run(3, 1000, 3, () -> IntList.range(1, 1000).toList().stream().filter((list1.stream().collect(Collectors.toSet()))::contains).collect(Collectors.toList())) .printResult(); – user_3380739 Feb 24 '17 at 20:06
2

@user3380739: `peek` alone is sufficient to prove that method references work as specified, for `expression::name`, the expression is evaluated only once. If you experience other results, you have a broken compiler. – Holger Feb 24 '17 at 20:13
@Holger: interesting... I misunderstood your previous comments. i thought it's rebuilt every time after verified it by peek in my test and that's why i ran the benchmark. After checked your code and verified my test again.you're right. the answer is no. but i'm stuck on the benchmark test: Take a look test: http://ideone.com/5KAh6A. the second solution is at least 5 times faster than the first one on my laptop. but i can't figure out why. (BTW, i know the benchmark test is not good. but i don't think it matters for this discussion) – user_3380739 Feb 24 '17 at 23:22
@user3380739: on Ideone, it prints zero two times as assertion are disabled by default. Besides that, [I already wrote “just swap the order of the two”](http://stackoverflow.com/questions/42440189/java-8-stream-unique-integers/42440427?noredirect=1#comment72037954_42440427), as it is not unusual for this kind of “benchmark”, that the second operations runs faster than the first. There are several Q&As explaining the issue on [SO]. – Holger Feb 27 '17 at 09:33

Java 8 stream unique Integers

5 Answers5