3

Let's prefix this by my objects equals implementation is not how I need to filter so distinct itself does not work.

class MyObject {
  String foo;
  MyObject( String foo ) {
    this.foo = foo;
  }
  public String getFoo() { return foo; }
}


Collection<MyObject> listA = Arrays.asList("a", "b", "c").stream().map(MyObject::new)
        .collect(Collectors.toList());

Collection<MyObject> listB = Arrays.asList("b", "d").stream().map(MyObject::new)
        .collect(Collectors.toList());


// magic

How can I merge and deduplicate the lists so that the resulting list should be of MyObjects containing "a", "b", "c", "d"?

Note: This is a simplification of what methods we actually need to deduplicate, which are actually complex DTOs of entities loaded by hibernate, but this example should adequately demonstrate the objective.

xenoterracide
  • 16,274
  • 24
  • 118
  • 243
  • Get the distinct set before mapping the values? – biziclop Sep 29 '15 at 22:03
  • @biziclop consider that these are entity lists loaded from hibernate. I left that out because I think it adds unnecessary complexity to the declaration of the problem. I don't actually have a list of strings that I'm mapping to a complex set of objects and then trying to distinct. I just have complex objects. – xenoterracide Sep 29 '15 at 22:05
  • 1
    Have you considered using Guava and its Equivalence? – fge Sep 29 '15 at 22:05
  • @fge I have not, we do have guava... though to some extent, we're trying to get away from guava. we know how to do this with a loop... – xenoterracide Sep 29 '15 at 22:13
  • @TagirValeev since 80% of the problem was dedup I'm not going to put up a lot of fight, but the difference between my question and the dupe include "merging multiple lists", and we are dealing with multiple properties (though I'm not sure that distinction is all that relevant). – xenoterracide Sep 30 '15 at 13:32

3 Answers3

3

Such feature is discussed by JDK developers (see JDK-8072723) and might be included in Java-9 (though not guaranteed). The StreamEx library developed by me already has such feature, so you can use it:

List<MyObject> distinct = StreamEx.of(listA).append(listB)
                                  .distinct(MyObject::getFoo).toList();

The StreamEx class is an enhanced Stream which is completely compatible with JDK Stream, but has many additional operations including distinct(Function) which allows you to specify key extractor for distinct operation. Internally it's pretty similar to the solution proposed by @fge.

You can also consider writing custom collector which will combine getting distinct objects and storing them to list:

public static <T> Collector<T, ?, List<T>> distinctBy(Function<? super T, ?> mapper) {
    return Collector.<T, Map<Object, T>, List<T>> of(LinkedHashMap::new,
        (map, t) -> map.putIfAbsent(mapper.apply(t), t), (m1, m2) -> {
            for(Entry<Object, T> e : m2.entrySet()) {
                m1.putIfAbsent(e.getKey(), e.getValue());
            }
            return m1;
        }, map -> new ArrayList<>(map.values()));
}

This collector intermediately collects the results into Map<Key, Element> where Key is the extracted Key and Element is the corresponding stream element. To make sure that exactly first occurring element will be preserved among all repeating ones, the LinkedHashMap is used. Finally you just need to take the values() of this map and dump them into the list. So now you can write:

List<MyObject> distinct = Stream.concat(listA.stream(), listB.stream())
                                .collect(distinctBy(MyObject::getFoo));

If you don't care whether the resulting collection is list or not, you can even remove the new ArrayList<>() step (just using Map::values as a finisher). Also more simplifications are possible if you don't care about order:

public static <T> Collector<T, ?, Collection<T>> distinctBy(Function<? super T, ?> mapper) {
    return Collector.<T, Map<Object, T>, Collection<T>> of(HashMap::new,
            (map, t) -> map.put(mapper.apply(t), t), 
            (m1, m2) -> { m1.putAll(m2); return m1; }, 
            Map::values);
}

Such collector (preserving the order and returning the List) is also available in StreamEx library.

Tagir Valeev
  • 97,161
  • 19
  • 222
  • 334
  • 1
    The question is, whether such a big gun is really necessary. If you know, that you won’t use parallel processing, a simple `.filter(new TreeSet<>(Comparator.comparing(MyObject::getFoo))::add)` will do… – Holger Sep 30 '15 at 09:20
  • 1
    @Holger, the question is whether the gun is big. If you already use my library, `.distinct(MyObject::getFoo)` looks much cleaner and easier to understand than your proposed solution. If not and don't want to add new dependencies, that's another story. Though I would still prefer a custom collector in this case (if collecting to list is wanted after distinct operation) as it's likely to be faster and will not violate the [`filter`](https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#filter-java.util.function.Predicate-) contract. – Tagir Valeev Sep 30 '15 at 09:37
1

If .equals() does not work for you then you may want to have a go at using Guava's Equivalence.

Provided that your type is T, you need to implement an Equivalence<T>; once you have this, you need to create a:

Set<Equivalence.Wrapper<T>>

into which you'll gather your values. Then, provided your implementation of Equivalence<T> is some static variable named EQ, adding to this set is as simple as:

coll1.stream().map(EQ::wrap).forEach(set::add);
coll2.stream().map(EQ::wrap).forEach(set::add);

And then to obtain a List<T> from this set, you could:

final Set<T> unwrapped = set.stream().map(Equivalence.Wrapper::get)
    .collect(Collectors.toSet());

But of course, since in your comments you say you can do it with a loop, well... Why not keep using that loop?

If it works, don't fix it...

fge
  • 119,121
  • 33
  • 254
  • 329
  • because said loop revolved around arcane nested loops, and sometimes you're just trying to level up and learn something you don't know – xenoterracide Sep 29 '15 at 22:22
  • 1
    Well then, an `Equivalence` is a good fit; meh, sorry that it binds you to Guava but this class is really a gem ;) – fge Sep 29 '15 at 22:24
  • 1
    Note that a cheap, easy replacement for an `Equivalence` is some class `D` which accepts a `T` as an argument and can still return it all the while implementing the equals/hashcode contract correctly. – fge Sep 29 '15 at 22:29
  • 1
    `Stream.of( coll1, coll2 ).map(EQ::wrap).distinct().map(equivalence::get).collect(Collectors.asList());` would seem perhaps even better than an intermediate set. – xenoterracide Sep 29 '15 at 22:36
  • 1
    Except you can't `new Stream()` since `Stream` is an interface... – fge Sep 29 '15 at 22:37
  • corrected, not `new`, `of`. from http://stackoverflow.com/a/23039457/206466 – xenoterracide Sep 29 '15 at 22:41
  • 1
    Well, yes, that works too. Anyway, as you can see you have lots of options, just choose the one which best suits you :) – fge Sep 29 '15 at 22:43
1
Collection<MyObject> result = Stream.concat(listA.stream(), listB.stream())
                              .filter(distinct(MyObject::getFoo))
                              .collect(Collectors.toList());

public static <T> Predicate<T> distinct(Function<? super T, Object> keyExtractor) {
        Map<Object, String> seen = new ConcurrentHashMap<>();
        return t -> seen.put(keyExtractor.apply(t), "") == null;
    }

I found this distinct function once in a blog (can't remember the link atm).

Spotted
  • 4,021
  • 17
  • 33
  • It's probably copied from [Stuart Marks](http://stackoverflow.com/a/27872086/4856258) answer. This solution does not preserve order in parallel and has unnecessary CHM overhead in sequential. Nevertheless it's quite simple. – Tagir Valeev Sep 30 '15 at 08:40