7

Suppose I have a class as

Class Person {
  String name;
  String uid;
  String phone;
}

I am trying to group by all the fields of the class. How do i use parallel streams in JAVA 8 to convert a

List<Person> into Map<String,Set<Person>>

where the key of the map is the value of each field in the class . JAVA 8 the following example groups by a single field, how can i do it for all fields of a class into a single Map?

ConcurrentMap<Person.Sex, List<Person>> byGender =
roster
    .parallelStream()
    .collect(
        Collectors.groupingByConcurrent(Person::getGender));
Ousmane D.
  • 54,915
  • 8
  • 91
  • 126
user3665053
  • 83
  • 1
  • 4

2 Answers2

5

You can either chain your grouping collectors which would give you a multi-level map. However, this is not ideal if you want to group by say more than 2 fields.

The better option would be to override the equals and hashcode methods within your Person class to define the equality of two given objects which in this case would be all the said fields. Then you can group by Person i.e groupingByConcurrent(Function.identity()) in which case you'll end up with:

ConcurrentMap<Person, List<Person>> resultSet = ....

Example:

class Person {
    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;

        Person person = (Person) o;

        if (name != null ? !name.equals(person.name) : person.name != null) return false;
        if (uid != null ? !uid.equals(person.uid) : person.uid != null) return false;
        return phone != null ? phone.equals(person.phone) : person.phone == null;
    }

    @Override
    public int hashCode() {
        int result = name != null ? name.hashCode() : 0;
        result = 31 * result + (uid != null ? uid.hashCode() : 0);
        result = 31 * result + (phone != null ? phone.hashCode() : 0);
        return result;
    }

    private String name;
    private String uid; // these should be private, don't expose
    private String phone;

   // getters where necessary
   // setters where necessary
}

then:

ConcurrentMap<Person, List<Person>> resultSet = list.parallelStream()
                .collect(Collectors.groupingByConcurrent(Function.identity()));
Ousmane D.
  • 54,915
  • 8
  • 91
  • 126
5

You can do that by using the of static factory method from Collector:

Map<String, Set<Person>> groupBy = persons.parallelStream()
    .collect(Collector.of(
        ConcurrentHashMap::new,
        ( map, person ) -> {
            map.computeIfAbsent(person.name, k -> new HashSet<>()).add(person);
            map.computeIfAbsent(person.uid, k -> new HashSet<>()).add(person);
            map.computeIfAbsent(person.phone, k -> new HashSet<>()).add(person);
        },
        ( a, b ) -> {
            b.forEach(( key, set ) -> a.computeIfAbsent(key, k -> new HashSet<>()).addAll(set));
            return a;
        }
    ));

As Holger in the comments suggested, following approach can be preferred over the above one:

Map<String, Set<Person>> groupBy = persons.parallelStream()
     .collect(HashMap::new, (m, p) -> { 
         m.computeIfAbsent(p.name, k -> new HashSet<>()).add(p); 
         m.computeIfAbsent(p.uid, k -> new HashSet<>()).add(p); 
         m.computeIfAbsent(p.phone, k -> new HashSet<>()).add(p); 
     }, (a, b) -> b.forEach((key, set) -> {
         a.computeIfAbsent(key, k -> new HashSet<>()).addAll(set));
     });

It uses the overloaded collect method which acts identical to my suggested statement above.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Lino
  • 19,604
  • 6
  • 47
  • 65
  • 2
    There is no need to use a `ConcurrentHashMap` here, an ordinary `HashMap` will do. You would need a `ConcurrentHashMap` if you specified the `CONCURRENT` characteristic. You can, by the way, simplify the solution further by using the three-arg version of `collect`: `persons.parallelStream() .collect(HashMap::new, (m, p) -> { m.computeIfAbsent(p.name, k -> new HashSet<>()).add(p); m.computeIfAbsent(p.uid, k -> new HashSet<>()).add(p); m.computeIfAbsent(p.phone, k -> new HashSet<>()).add(p); }, (a, b) -> b.forEach((key, set) -> a.computeIfAbsent(key, k -> new HashSet<>()).addAll(set)));` – Holger Dec 18 '17 at 14:57
  • @Holger Well i used the concurrent hash map because OP was talking about parallel. But forgot to add the characteristics. Also edited your part into the answer – Lino Dec 18 '17 at 15:23
  • 1
    A `Collector` without the `CONCURRENT` characteristic still can be used with a parallel stream. In that case, the Stream implementation takes care to use the functions appropriately; that’s why it works with arbitrary collections and maps and doesn’t require a `ConcurrentHashMap`. See [this Q&A](https://stackoverflow.com/q/41041698/2711488) for a discussion of the differences. – Holger Dec 19 '17 at 07:42