1
Set<String> unique = new HashSet<>();
List<String> duplicates = new ArrayList<>();

for (Animal cat: animals) {
    if (!unique.add(cat.getName())) {
        duplicates.add(cat.getName());
    }
}
return duplicates;

I was wondering if there was a way to simplify this? I'm new to Java streams and I tried using Map but I resorted to the traditional for loop instead.

Nikolas Charalambidis
  • 40,893
  • 16
  • 117
  • 183
RonApple1996
  • 269
  • 3
  • 13
  • What's the declared type of `animals`? – ernest_k Aug 07 '18 at 18:41
  • you could call `animals.sort()` then iterate on the list : if current element equals the precedent element then it is a duplicate. – Arnaud Denoyelle Aug 07 '18 at 18:48
  • What is it that you're trying to do? If this is a method that returns all animals (or cats) of which there is more than one, you can use the `count` method. – SeverityOne Aug 07 '18 at 18:50
  • 1
    Possible duplicate of [Java 8, Streams to find the duplicate elements](https://stackoverflow.com/questions/27677256/java-8-streams-to-find-the-duplicate-elements) – Mick Mnemonic Aug 07 '18 at 19:01
  • 1
    A question was raised in a comment to a now-deleted answer. If `animals` is a list with `[A, B, A, C, B, A]`, your code currently returns `[A, B, A]`. Is that your actual intent (i.e. 2 A's, not 1 or 3), and is the result required to be in that order? – Andreas Aug 07 '18 at 19:32

6 Answers6

6

I was wondering if there was a way to simplify this? I

The stream way is probably not what you need for your requirement and your code is actually simple.

Streams allow to pass from an input (Stream<Foo>) to a result (Foo, List<String>...) thanks to multiple intermediary operations (filter, map, reduce, stream a collected result...).
Each stream operation relies on the returned stream of the next operation :
To simplify it would give a chain :

a -> b (use a)-> c (uses b)-> d (uses c)-> e (uses d)

Actually you code cannot rely on such a logic because the returned List<String> doesn't need only to return the list of names that have duplicates, that could be written such as :

List<String> duplicates =     
    animals.stream()
           .collect(Collectors.groupingBy(Animal::getName))
           .entrySet().stream()
           .filter(e -> e.getValue().size() > 1)
           .map(Entry::getKey)
           .collect(Collectors.toList());

you want return a List of each duplicate appearition in the order these occur.
It means that you don't map Stream<Animal> -> Stream<String> such as

a->b (uses a)

because you need to filter out the element if the Animal name was not added in the result ... but a stream is not designed to populate incrementally the result. So you are stuck.


You could write this one but as said that is really not a simplification and still it doesn't apply the same logic as the order of dup names is not the same as in your code :

List<String> duplicates = 
          animals.stream()
                 .collect(Collectors.groupingBy(Animal::getName, LinkedHashMap::new, Collectors.toList()))
                 .values().stream()
                 .flatMap(l-> l.stream().skip(1)) 
                 .map(Animal::getName)
                 .collect(Collectors.toList());
davidxxx
  • 125,838
  • 23
  • 214
  • 215
3

Do you want to extract the duplicate String names of the list of Animal according to their names? Although your code doesn't involve the first found duplicate and returns the List where the duplicates are in the n-1 count, here it is:

Set<String> set = new HashSet<>();
List<String> names = animals.stream()
                            .map(cat -> cat.getName())      // Names to collect and compare
                            .filter(name -> !set.add(name)) // Collect duplicates
                            .collect(Collectors.toList());  // To the List

This solution is based on your for-loop and performs the same thing. However, the documentation of Stream API states that the constructs should be non-infering and stateless - which means independent to sources which could change the state.


Here is an alternative working way in regard to the Stream-API documentation - but a bit complicated:

List<String> names = animals.stream()
    .collect(Collectors.groupingBy(
         Animal::getName, 
         Collectors.counting()))            // Collects to Map <name, count>
    .entrySet()                             // Gets the entries
    .stream()                               // Stream of the entries
    .filter(entry -> entry.getValue() > 1)  // Filters the duplicates
    .map(entry -> Collections.nCopies(      // Creates n-1 copies of the key as same as the
        entry.getValue().intValue() - 1,    //   OP's sample consumes the first duplication
        entry.getKey()))                    //   and includes the remainin ones
    .flatMap(List::stream)                  // Flattens the structure
    .collect(Collectors.toList());          // Results in the List

Both ways result from the input:

List<Animal> animals = Arrays.asList(
    new Animal("A"), new Animal("A"), new Animal("A"), 
    new Animal("B"), new Animal("B"), new Animal("C"));

The following output (unordered):

[A, B, A]

Nikolas Charalambidis
  • 40,893
  • 16
  • 117
  • 183
  • 3
    Although this will work, it totally violates the **stateless** contract of [`filter()`](https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#filter-java.util.function.Predicate-): *"Parameters: `predicate` - a [non-interfering](https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html#NonInterference), [**stateless**](https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html#Statelessness) predicate to apply to each element to determine if it should be included"* – Andreas Aug 07 '18 at 18:56
  • For input `1,2,1,3,2,1`: Question code returns `[1, 2, 1]`, but "alternative way" code returns `[1, 2]`. – Andreas Aug 07 '18 at 19:17
  • 1
    This is actually quite similar to the problematic example from the [docs](https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html#Statelessness). – shmosel Aug 07 '18 at 19:45
  • Generally I wouldn't say something that violates the spec works, even if it produces the intended result. Though it is a bit unclear whether statelessness is a requirement or a recommendation. – shmosel Aug 07 '18 at 19:52
  • There was some good discussion about predicate statelessness in the comments of [this answer](https://stackoverflow.com/a/30053822/905488), with Stuart Marks also pitching in. In this case (single-threaded, single-use) it's probably an acceptable solution. – Mick Mnemonic Aug 07 '18 at 19:54
  • Finally, I have found the way according to the OP's sample. – Nikolas Charalambidis Aug 07 '18 at 19:56
1

I don't know whether this could be considered a simplification, but here's one way to do it with streams:

return animals.stream()
        .collect(Collectors.groupingBy(Animal::getName))
        .values()
        .stream()
        .flatMap(group -> group.stream().skip(1))
        .map(Animal::getName)
        .collect(Collectors.toList());
shmosel
  • 49,289
  • 6
  • 73
  • 138
0

Do not re-invent the wheel and use libraryies like commns-collection.substract()

// I have not tested the code, but I think you get the idea
Set<> unique = new HashSet(animals)
Collection<> dublicates = CollectionUtil.subtract(animals, unique)
thopaw
  • 3,796
  • 2
  • 17
  • 24
  • 2
    imho, a library is a high price to pay for just a method. I think that something is doable with `retainAll()` – Arnaud Denoyelle Aug 07 '18 at 18:50
  • Downvoting as this can easily be done without the usage of a third-party library. – akortex Aug 07 '18 at 18:52
  • @Aris - I don't believe this should be down voted. It's a simple solution for a problem. Others have tackled this issue before, why not use their experience, or as he said "Do not re-invent the wheel" – Dan Aug 07 '18 at 18:55
  • The need of a library is only necessary as javas collections are not as sophisticated as in other languages. Still with java 8 and lambdas the use filtering, joining .etc. of collections is very low level compared to other languages IMHO. – thopaw Aug 07 '18 at 18:56
  • 1
    @Dan no it's not. If you're presented with a bit of code that may actually exist somewhere (supposing this is already part of a project) and asked to make it better, your first action would not be to strip it out and go over to a third party library. There is native JDK support for such actions and the various data structures available are there to also help. OP asked how to improve something already existing, not how to throw it away and move to a lib. – akortex Aug 07 '18 at 18:58
  • @Aris, no one is talking about ripping out code. You would just need to import the Collection-utils lib and have the power to Apache by your side. They've really helped our projects out in the past with their utility methods – Dan Aug 07 '18 at 19:10
  • 1
    @Aris It may not be worth including a library for this case alone, but it's perfectly reasonable to suggest it as an option in case OP or anyone else is already including it. The only reason I would say it's irrelevant is that OP specifically asked about streams. Then again, people don't always know what's best for them. – shmosel Aug 07 '18 at 19:16
  • OP was asking about how to replace an already existing piece of code with some implemented using core functions of newer JDK versions. Replying with a suggestion to just import a third party library seems very counter-intuitive and also misses the original point of the question. One could have easily suggested Guava as a viable alternative instead. The point is not to suggest a ready made solution (lib) but rather an solution based on using core JDK stuff. – akortex Aug 07 '18 at 23:00
  • Furthermore and just to clarify on this, I'm not saying that libs are not useful, but in most cases (especially for beginners) they tend to hide important algorithmic implementation details. For example I don't see why once could not have attained the same functionality just by using `Collection#removeAll` and two Sets. – akortex Aug 07 '18 at 23:04
0

there is always a way — still not simple, but much shorter:

List<Animal> duplicates = animals.stream()
  .collect( Collectors.collectingAndThen( Collectors.groupingBy( Animal::getName ),
    map -> {
      map.entrySet().removeIf( e -> e.getValue().size() < 2 );
      return( map.values().stream().flatMap( List::stream ).collect( Collectors.toList() ) );
    } ) );
Kaplan
  • 2,572
  • 13
  • 14
-1

The question is "how to do it with Streams?" but I consider that a good answer is "Streams are not always a simplification".

The problem of detecting duplicates is classic and there is a canonic way of doing it :

  • Sort the collection.
  • Iterate on it.
  • Every element that equals its predecessor is a duplicate.

So, although it does not really answer the question, the right way of doing it is imho like this :

List<Animal> animals =
        Arrays.asList(
                new Animal("Alice"), 
                new Animal("Alice"), 
                new Animal("Alice"), 
                new Animal("Bob"),
                new Animal("Charlie"), 
                new Animal("Bob"));

List<Animal> duplicates = new ArrayList<>();

animals.sort(Comparator.comparing(Animal::getName));
for (int i = 1; i < animals.size(); i++) {
    Animal current = animals.get(i);
    if (animals.get(i - 1).getName().equals(current.getName())
            //Bonus : Also compare to the previous-previous in order to avoid multiple duplicates
            && (i < 2 || !animals.get(i - 2).getName().equals(current.getName()))) {
        duplicates.add(current);
    }
}

duplicates.forEach(a -> System.out.println(a.getName()));

Output:

Bob
Alice

It might not be easier to understand (depends on your experience) but is by far cleaner than creating an intermediate HashMap for the sake a using a Stream.

So, either do it like this (for performance) or do it as you already do (for readability).

Arnaud Denoyelle
  • 29,980
  • 16
  • 92
  • 148
  • Multiple compilation errors. 1) You're assuming class `Animal` has a natural order. Even if it does, you're also assuming the order is by name. --- 2) `String current = animals.get(i)`: *Type mismatch: cannot convert from Animal to String*. --- 3) `duplicates` is a `Set`, but result should be a `List`. – Andreas Aug 07 '18 at 19:23
  • @Andreas Can you double-check? It executes nicely on my side (using JDK 10 but should be fine from JDK 8 onwards). – Arnaud Denoyelle Aug 07 '18 at 19:25
  • @Andreas ok. That is because I consider `animals` as a `List`which is not what the OP uses. I adapt the code and edit the answer. – Arnaud Denoyelle Aug 07 '18 at 19:27
  • Silently assuming different input makes this answer useless. – Andreas Aug 07 '18 at 19:27
  • The code in the question would return `[Alice, Alice, Bob]`, so your result is incorrect. – Andreas Aug 08 '18 at 00:02
  • @Andreas This is intended. Remove the `//Bonus` condition and the result it the one from the question. – Arnaud Denoyelle Aug 08 '18 at 07:49