4

Setup and question

I have a stream over a simple Java data class like:

class Candidate{
    private Long id;
    private String fullName;
    private String job;
    private String adress;
}

I would like to filter my stream by two properties:

  • remove all duplicates by job
  • but keep everyone with adress "Italy" regardless of job

Example

Consider an example data set like

ID fullName JOB adress
1 Peter Bright IT Engineer Italy
2 Patrick Manon Electronics engineer Spain
3 Bob Jina IT Engineer Suisse
4 Alexander Layo Security Engineer UK

or in Java:

Candidate c1 = new Candidate(1,"Peter Bright","IT Engineer","Italy");
Candidate c2 = new Candidate(2,"Patrick Manon","Electronics engineer","Spain");
Candidate c3 = new Candidate(3,"Bob Jina","IT Engineer","Suisse");
Candidate c4 = new Candidate(4,"Alexander Layo","Security Engineer","UK");

Stream<Candidate> candidates = Stream.of(c1, c2, c3, c4);

I would like to filter the stream in a way that the outcome is:

ID fullName JOB ADSRESS
1 Peter Bright IT Engineer Italy
2 Patrick Manon Electronics engineer Spain
4 Alexander Layo Security Engineer UK

Note that Bob Jina got removed since IT Engineer was already there.

In the case of there is duplicates candidates all from Italy we need to keep all of them

bloudr
  • 79
  • 7

3 Answers3

2

You could collect to a map of Candidates, using job as the key and keeping Italian candidates whenever there is a collision:

Collection<Candidate> result = candidates.stream()
    .collect(Collectors.toMap(
        Candidate::getJob,
        Function.identity(),
        (c1, c2) -> "italy".equalsIgnoreCase(c1.getAddress()) ? c1 : c2,
        LinkedHashMap::new))
    .values();

This uses the overload of Collectors.toMap that accepts four arguments.

I'm collecting to a LinkedHashMap to preserve insertion-order.

This solution returns candidates in a Collection, if you need a List, create it from the collection:

List<Candidate> list = new ArrayList<>(result);
Zabuzard
  • 25,064
  • 8
  • 58
  • 82
fps
  • 33,623
  • 8
  • 55
  • 110
  • 2
    That's awesome , but in the case of duplicate candidates and all are "italian" , I need to keep all of them! – bloudr Aug 20 '21 at 18:52
  • @SafeMediaWeb That changes your original question and slightly complicates the problem... Do you mean that you want to keep all Italian candidates with the same job? Please clarify – fps Aug 20 '21 at 20:31
  • @SafeMediaWeb Also, what if two candidates with the same job are from two different countries, other than Italy? – fps Aug 20 '21 at 20:37
  • 1
    I am sorry i just realized that when testing – bloudr Aug 20 '21 at 20:58
  • 1
    Yes if candidates have the same job and same location , i need to keep them all – bloudr Aug 20 '21 at 21:00
  • 1
    To answer your question "what if two candidates with the same job are from two different countries, other than Italy?" , Italy should appear in this case , don't worry about that – bloudr Aug 20 '21 at 21:02
  • hello @fps, sorry but I am getting a nullpointer exception when calling your method code inside a test function ! do you have any idea why ? – bloudr Sep 01 '21 at 13:04
  • @SafeMediaWeb Maybe some candidate instance is null, or its job – fps Sep 01 '21 at 14:00
2

As I understand it, all jobs, regardless of job type and country will be kept unless Italy is involved. In that case, for any given job, only Italy's will be retained.

You can do it like so. I chose two stages.

  • first build a map keyed by jobs.
  • then filter those jobs favoring Italy.

Create the map

Map<String, List<Candidate>> jobs = list.stream()
        .collect(Collectors.groupingBy(Candidate::job));

Now, check all the lists.

  • if the list for any job does not contain Italy, use that list in its entirety.
  • otherwise, use the list excluding all but Italy.
List<Candidate> favorItaly = jobs.values()
     .stream()
     .map(lst -> {
              List<Candidate> italy = lst
                   .stream()
                   .filter(c -> c.getAddress().equalsIgnoreCase("Italy"))
                   .toList();
              // if italy size is 0, use other, else use italy  
              return italy.size() == 0 ? lst : italy;
      })
     .flatMap(List::stream)
     .toList();

The two stages could have been combined but there is no performance improvement in doing so and this way avoids clutter.

WJS
  • 36,363
  • 4
  • 24
  • 39
1

Filter

You can achieve this quite easily by using filter and a custom Predicate that is using a Set under the hood to memorize what candidates it already saw.

Also see this highly related answer Java 8 Distinct by property.

Disclaimer: This solution (like the linked answer) proposes a stateful predicate, which, in general, should be considered bad practice and error-prone for various reasons.


Filter by distinct name

First of, lets create a Predicate that only accepts a candidate if the job name has not been seen yet. Therefore, lets just steal the code from the post I linked above:

public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
    Set<Object> seen = ConcurrentHashMap.newKeySet();
    return t -> seen.add(keyExtractor.apply(t));
}

We create our predicate by writing distinctByKey(Candidate::getJob).


Accept all Italians

The last part is to combine this predicate with a predicate that just accepts anyone from Italy, regardless of the job name:

Predicate<Candidate> acceptItalian = candidate -> candidate.getAddress().equals("Italy");

Putting it together

Now lets combine both using Predicate#or and use filter on the stream:

Stream<Candidate> candidates = ...
Stream<Candidate> filtered = candidates.filter(
    acceptItalian.or(distinctByKey(Candidate::getJob))
);

And now you have your filtered stream. Call .toList() on it to get a List, for example.

Note that its important that we put the two predicates together in that order. Otherwise the Italians would pollute the Set and lock out non-Itialians with the same job name. The functionality we rely on here is called short-circuiting, which Predicate#or does make use of (this is a documented feature you can rely on).

Zabuzard
  • 25,064
  • 8
  • 58
  • 82
  • Using stateful predicates is discouraged and error-prone – fps Aug 20 '21 at 18:07
  • @fps Generally I would agree on this, but do you have an alternative? I dont. That said, I dont think there is anything bad with it in this particular situation. Keep the predicate short lived and tied to this stream iteration only (dont put it into a variable, dont expose it) and it is safe. – Zabuzard Aug 20 '21 at 18:07
  • 1
    Uploaded an answer – fps Aug 20 '21 at 18:14
  • 1
    @fps Thats a good alternative, I like it. I added a disclaimer to my answer so that our answers can co-exist in piece, haha. – Zabuzard Aug 20 '21 at 18:17
  • 1
    Thanks for the edit! Yes, your solution is correct, though a little more complex. It doesn't follow the java docs recommendation of using *stateless* predicates, either. But sometimes it's OK to break the rules :) – fps Aug 20 '21 at 18:21
  • 1
    The problem of stateful predicates is demonstrated by the dependency on the order. It must be `distinctByKey(Candidate::getJob).or(acceptItalian)`, to be sure that the jobs of Italians still get recorded, even if they are accepted anyway, to prevent other non-Italian candidates with the same job. But even then, there’s still the problem that it makes a difference whether the non-Italian candidate have been seen first. – Holger Aug 23 '21 at 14:08