Make an iterative multi-threaded method

Question

I have a method named find_duplicates(List<DP> dp_list) which takes an ArrayList of my custom data type DP. Each DP has a String named 'ID' which should be unique for each DP.

My method goes through the whole list and adds any DP which does not have a unique ID to another ArrayList, which is returned when the method finishes. It also changes a boolean field isUnique of the DP from true to false.

I want to make this method multi-threaded, since each check of an element is independent of other elements' checks. But for each check the thread would need to read the dp_list. Is it possible to give the read access of the same List to different threads at the same time? Can you suggest a method to make it multithreaded?

Right now my code looks like this-

List<DP> find_duplicates(List<DP> dp_list){
    List<DP> dup_list = new ArrayList<>();
    for(DP d: dp_list){
        -- Adds d to dup_list and sets d.isUnique=false if d.ID is not unique --
    }
    return dup_list;
}

You can use java-8 parallel stream – Afridi Jun 08 '18 at 11:24 — Afridi, Jun 08 '18 at 11:24

Paul Benn · Answer 1 · 2018-06-08T11:35:53.220

1

List<DP> unique = dp_list.stream().parallel().distinct().collect(Collectors.toList());

Then just find the difference between the original list and the list of unique elements and you have your duplicates.

Obviously you will need a filter if your items are only unique by one of their fields - a quick SO search for "stream distinct by key" can provide a myriad of ways to do that.

edited Jun 08 '18 at 11:35

answered Jun 08 '18 at 11:25

Paul Benn

1,911
11
26

score 1 · Answer 2 · answered Jun 08 '18 at 11:45

It seems like you want to leverage parallelism where possible. First and foremost I'd suggest measuring your code whether that is with an imperative approach or using a sequential stream and then if you think by going parallel can really help improve performance then you can use a parallel stream. see here to help decide when to use a parallel stream.

As for accomplishing the task at hand, it can be done as follows:

List<DP> find_duplicates(List<DP> dp_list){
        List<DP> dup_list = dp_list.stream() //dp_list.parallelStream()
                .collect(Collectors.groupingBy(DP::getId))
                .values()
                .stream()
                .filter(e -> e.size() > 1)
                .flatMap(Collection::stream)
                .collect(Collectors.toList());

        dup_list.forEach(s -> s.setUnique(false));
        return dup_list;
}

This will create a stream from the source then groups the elements by their ids and retains all the elements that have a duplicate id and then finally sets the isUnique field to false;

score 0 · Answer 3 · answered Jun 08 '18 at 11:25

There are better ways in which you can do this. All you need to do is get the lock of the list and check if item exists followed by further processing.

void find_duplicates(List<DP> dp_list, DP item){

    synchronized(dp_list){
        if(dp_list.contains(item)){
            //Set your flags
        }
    }


}

Make an iterative multi-threaded method

3 Answers3