Java 8 Parallel Stream Concurrent Grouping

Question

Suppose I have a class as

Class Person {
  String name;
  String uid;
  String phone;
}

I am trying to group by all the fields of the class. How do i use parallel streams in JAVA 8 to convert a

List<Person> into Map<String,Set<Person>>

where the key of the map is the value of each field in the class . JAVA 8 the following example groups by a single field, how can i do it for all fields of a class into a single Map?

ConcurrentMap<Person.Sex, List<Person>> byGender =
roster
    .parallelStream()
    .collect(
        Collectors.groupingByConcurrent(Person::getGender));

Ousmane D. · Answer 1 · 2017-12-16T18:49:59.947

You can either chain your grouping collectors which would give you a multi-level map. However, this is not ideal if you want to group by say more than 2 fields.

The better option would be to override the equals and hashcode methods within your Person class to define the equality of two given objects which in this case would be all the said fields. Then you can group by Person i.e groupingByConcurrent(Function.identity()) in which case you'll end up with:

ConcurrentMap<Person, List<Person>> resultSet = ....

Example:

class Person {
    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;

        Person person = (Person) o;

        if (name != null ? !name.equals(person.name) : person.name != null) return false;
        if (uid != null ? !uid.equals(person.uid) : person.uid != null) return false;
        return phone != null ? phone.equals(person.phone) : person.phone == null;
    }

    @Override
    public int hashCode() {
        int result = name != null ? name.hashCode() : 0;
        result = 31 * result + (uid != null ? uid.hashCode() : 0);
        result = 31 * result + (phone != null ? phone.hashCode() : 0);
        return result;
    }

    private String name;
    private String uid; // these should be private, don't expose
    private String phone;

   // getters where necessary
   // setters where necessary
}

then:

ConcurrentMap<Person, List<Person>> resultSet = list.parallelStream()
                .collect(Collectors.groupingByConcurrent(Function.identity()));

score 5 · Accepted Answer · edited Nov 10 '19 at 10:07

5

You can do that by using the of static factory method from Collector:

Map<String, Set<Person>> groupBy = persons.parallelStream()
    .collect(Collector.of(
        ConcurrentHashMap::new,
        ( map, person ) -> {
            map.computeIfAbsent(person.name, k -> new HashSet<>()).add(person);
            map.computeIfAbsent(person.uid, k -> new HashSet<>()).add(person);
            map.computeIfAbsent(person.phone, k -> new HashSet<>()).add(person);
        },
        ( a, b ) -> {
            b.forEach(( key, set ) -> a.computeIfAbsent(key, k -> new HashSet<>()).addAll(set));
            return a;
        }
    ));

As Holger in the comments suggested, following approach can be preferred over the above one:

Map<String, Set<Person>> groupBy = persons.parallelStream()
     .collect(HashMap::new, (m, p) -> { 
         m.computeIfAbsent(p.name, k -> new HashSet<>()).add(p); 
         m.computeIfAbsent(p.uid, k -> new HashSet<>()).add(p); 
         m.computeIfAbsent(p.phone, k -> new HashSet<>()).add(p); 
     }, (a, b) -> b.forEach((key, set) -> {
         a.computeIfAbsent(key, k -> new HashSet<>()).addAll(set));
     });

It uses the overloaded collect method which acts identical to my suggested statement above.

edited Nov 10 '19 at 10:07

marc_s

732,580
175
1,330
1,459

answered Dec 16 '17 at 18:46

Lino

19,604
6
47
65

2

There is no need to use a `ConcurrentHashMap` here, an ordinary `HashMap` will do. You would need a `ConcurrentHashMap` if you specified the `CONCURRENT` characteristic. You can, by the way, simplify the solution further by using the three-arg version of `collect`: `persons.parallelStream() .collect(HashMap::new, (m, p) -> { m.computeIfAbsent(p.name, k -> new HashSet<>()).add(p); m.computeIfAbsent(p.uid, k -> new HashSet<>()).add(p); m.computeIfAbsent(p.phone, k -> new HashSet<>()).add(p); }, (a, b) -> b.forEach((key, set) -> a.computeIfAbsent(key, k -> new HashSet<>()).addAll(set)));` – Holger Dec 18 '17 at 14:57
@Holger Well i used the concurrent hash map because OP was talking about parallel. But forgot to add the characteristics. Also edited your part into the answer – Lino Dec 18 '17 at 15:23
1

A `Collector` without the `CONCURRENT` characteristic still can be used with a parallel stream. In that case, the Stream implementation takes care to use the functions appropriately; that’s why it works with arbitrary collections and maps and doesn’t require a `ConcurrentHashMap`. See [this Q&A](https://stackoverflow.com/q/41041698/2711488) for a discussion of the differences. – Holger Dec 19 '17 at 07:42

Java 8 Parallel Stream Concurrent Grouping

2 Answers2

Linked

Related