66

In java 8, what's the best way to check if a List contains any duplicate?

My idea was something like:

list.size() != list.stream().distinct().count()

Is it the best way?

Michele Dorigatti
  • 811
  • 1
  • 9
  • 17
pedrorijo91
  • 7,635
  • 9
  • 44
  • 82

6 Answers6

74

Your code would need to iterate over all elements. If you want to make sure that there are no duplicates simple method like

public static <T> boolean areAllUnique(List<T> list){
    Set<T> set = new HashSet<>();

    for (T t: list){
        if (!set.add(t))
            return false;
    }
    
    return true;
}

would be more efficient since it can give you false immediately when first non-unique element would be found.

This method could also be rewritten using Stream#allMatch which also is short-circuit (returns false immediately for first element which doesn't fulfill provided condition)
(assuming non-parallel streams and thread-safe environment)

public static <T> boolean areAllUnique(List<T> list){
    Set<T> set = new HashSet<>();
    return list.stream().allMatch(t -> set.add(t));
}

which can be farther shortened as @Holger pointed out in comment

public static <T> boolean areAllUnique(List<T> list){
    return list.stream().allMatch(new HashSet<>()::add);
}
Pshemo
  • 122,468
  • 25
  • 185
  • 269
  • 49
    This can be even a one-liner: `return list.stream().allMatch(new HashSet<>()::add);` – Holger May 05 '15 at 13:10
  • 2
    @Holger Wow, I never though about that. Every day I learn something new :) Thanks! – Pshemo May 05 '15 at 13:11
  • 8
    Seems a bit dangerous to use a predicate with side effects here. – Stuart Marks May 05 '15 at 14:17
  • @StuartMarks Can you say something more about this subject? Are you referring to both `allMatch` solutions or only `allMatch(new HashSet<>()::add)`? – Pshemo May 05 '15 at 14:21
  • 10
    Both. The spec requires the predicate to be stateless. The usual caveat about running in parallel applies. ConcurrentHashMap might help. There may be other issues but I haven't had coffee yet. :-) – Stuart Marks May 05 '15 at 14:24
  • Yes, concurrency can make life harder. My answer is based on big assumption that it will be used by single thread without parallel stream, just like shown in first code example. This means that list will also not change while iterating. – Pshemo May 05 '15 at 14:54
  • 7
    @Stuart Marks: I think, for a one liner which spans the entire life cycle of the `Stream`, it’s acceptable to rely on the programmer to recognize whether that stream use is multi-threaded or not. Of course, if in doubt, using a `ConcurrentMap` might help. And a stateful predicate might not be the best thing, but is sometimes unavoidable, distinct tests being *the* common example. Maybe you remember [this one](http://stackoverflow.com/a/27872852/2711488) ;^) – Holger May 05 '15 at 17:36
  • 5
    @Holger Of course I remember. :-) But using a stateful predicate requires a certain amount of caution. I think it's likely that somebody will copy and paste this code into a different context, without understanding its restrictions, and introduce bugs that way. – Stuart Marks May 05 '15 at 20:28
  • 3
    Perhaps a brief note is due in the answer regarding @StuartMarks' concern. – Radiodef May 05 '15 at 20:32
  • thank you all. just for some context, I was thinking just in a simple case, no concurrency problems :) – pedrorijo91 May 06 '15 at 09:52
  • 1
    @Holger, @Pshemo, @StuartMarks, why not change to `stream.seguential().allMatch(new HashSet<>()::add);`? Will it be always correct? And I guess no overhead? – Sasha Dec 05 '16 at 14:53
  • whats wrong with return list.size() == new HashSet<>(list).size(); as suggested by @Sasha? – Mubashar Nov 27 '17 at 01:02
  • 6
    @MubasharAhmad It is not that it is *wrong*, but `new HashSet<>(list)` will iterate over entire `list`, while `.allMatch(new HashSet<>()::add)` is short-circuit, it will return `false` at first occurrence of duplicate element. – Pshemo Nov 27 '17 at 01:24
  • @Pshemo: Thanks this was the answer I was looking for – Mubashar Nov 27 '17 at 01:39
  • by the way what if someone is looking for the inverse version of it e.g. it should return true if any value is duplicate? one way is to use the above with not sign at the start but should I try anyMatch function. Just curious – Mubashar Nov 27 '17 at 22:20
  • Using Set and `anyMatch` would mean Predicate should return `false` for every *successful* adding to set and true otherwise. So we can't use nice `new HashSet<>()::add` if we want to add negation there (unless we will explicitly cast it to `Predicate` to be able to use `negate()` method like `list.stream().anyMatch(((Predicate) new HashSet<>()::add).negate())`. Without it we would need to create `Set` before stream and use lambda like `list.stream().anyMatch(elem -> !set.add(elem))`. – Pshemo Nov 27 '17 at 23:10
  • If the list is sorted, you only have to look at two items next to each other, not the whole result. – Thorbjørn Ravn Andersen May 03 '18 at 09:03
  • 2
    @Holger Can you help me understand how the one-liner works? I can see how it would work if we had a `final HashSet` the predicate refers, but how come the elements are being added to the same HashSet with the current way? – Koray Tugay Jul 14 '19 at 18:43
  • Instead of `list.stream().allMatch(t -> set.add(t));` you can also do `list.stream().allMatch(set::add))` – Koray Tugay Jul 14 '19 at 18:53
  • 5
    @KorayTugay a method reference of the form `expression::name` will evaluate `expression` and capture its result when the instance of the functional interface (here, a `Predicate`) is created and then reused throughout the entire lifetime of that object. See [What is the equivalent lambda expression for System.out::println](https://stackoverflow.com/q/28023364/2711488) or [this Q&A](https://stackoverflow.com/q/37979731/2711488)… – Holger Jul 15 '19 at 11:23
19

I used the following:
1. return list.size() == new HashSet<>(list).size();.

I'm not sure how it compares to:
2. return list.size() == list.stream().distinct().count();
and
3. return list.stream().sequential().allMatch(new HashSet<>()::add);
in terms of performance.

The last one (#3) has possibility to handle not only collections (e.g. lists), but also streams (without explicitly collecting them).

Upd.: The last one (#3) seems to be the best not only because it can handle pure streams, but also because it stops on the first duplicate (while #1 and #2 always iterate till the end) — as @Pshemo said in comment.

Sasha
  • 3,599
  • 1
  • 31
  • 52
10

You can use the counting collector.

Stream.of(1, 3, 4, 6, 7, 5, 6)
            .collect(Collectors.groupingBy(
                    Function.identity(), Collectors.counting()))
            .entrySet().stream().anyMatch(e -> e.getValue() > 1)
Will Humphreys
  • 2,677
  • 3
  • 25
  • 26
3

Given array arr,

arr.length != Arrays.stream(arr).distinct().count()

will help check for duplicates

Pshemo
  • 122,468
  • 25
  • 185
  • 269
Edor Linus
  • 826
  • 8
  • 9
2

Started this class as a StreamTool, but I think there must be an even better way with reduce or similar:

public class StreamTool {

    /**
     * Whether stream records are unique in that stream.
     * @param <T> Type of records
     * @param records
     * @return true if there are no duplicates, false otherwise
     */
    public static <T> boolean isUnique(Stream<T> records) {
        return records.allMatch(new HashSet<>()::add);
    }
}
geekdenz
  • 759
  • 1
  • 5
  • 8
0

Use set.add() it is faster.

Set<T> items = new HashSet<>();
list.stream().filter(n -> !items.add(n)) 
            .collect(Collectors.toSet());
itro
  • 7,006
  • 27
  • 78
  • 121
  • What is the point of creating two Set objects (one via `new HashSet` and other via `Collections.toSet()`? Also above code could be reduce to `new HashSet<>(list)`, *but* question is **not** about *removing* duplicates, it is about checking if duplicate elements *exist*. – Pshemo Apr 22 '22 at 14:29