3

I currently have a java program that uses nested for loops to compute the union and intersection of a list of set of integers. How to do this using java parallel streams ? The code i have currently is as follows

for(Set<Integer> x : listA) {
  for (Set<Integer> y : listB) {
       Set u = Sets.union(x,y); // Uses Guava library
       Set i = Sets.intersection(x,y);
  }
}

I would like to make this fast as listA and listB are large.

Ram K
  • 1,746
  • 2
  • 14
  • 23
  • 5
    Since you are not doing anything with the result, just removing the entire code is the fastest solution. Otherwise, you should include the actual operation that is supposed to run in parallel. – Holger Mar 13 '19 at 09:30
  • @Holger is it not possible to make the union operation itself parallel ? – Ram K Mar 13 '19 at 09:37
  • 1
    Not with significant performance improvement. It’s rather likely to get worse performance when doing `union` or `intersection` in parallel. As you said yourself, `listA` and `listB` are large, so you should focus on processing these lists in parallel. – Holger Mar 13 '19 at 09:40
  • Parallellization is not easy. Maybe this approach is good enough: https://stackoverflow.com/questions/7574311/efficiently-compute-intersection-of-two-sets-in-java – Adriaan Koster Mar 13 '19 at 09:40
  • @Holger thank you for your comment – Ram K Mar 13 '19 at 09:44

4 Answers4

2

You don't need streams for union, however, you can use it for intersection, e.g.:

Set<Integer> setA = new HashSet<>(Arrays.asList(1,2,3));
Set<Integer> setB = new HashSet<>(Arrays.asList(2,3,4));
Set<Integer> union = new HashSet<>();
union.addAll(setA);
union.addAll(setB);

Set<Integer> intersection = setA.parallelStream()
        .filter(setB::contains)
        .collect(Collectors.toSet());

System.out.println("Union : " + union);
System.out.println("Intersection : " +intersection);

Update

The above code finds intersection and union using Java's native libraries and streams. However, if you have a list of sets then you can wrap the above code in function and call it from stream that iterates two lists, e.g.:

private static void unionAndIntersection(Set<Integer> setA, Set<Integer> setB) {
    Set<Integer> union = new HashSet<>();
    union.addAll(setA);
    union.addAll(setB);

    Set<Integer> intersection = setA.parallelStream()
            .filter(setB::contains)
            .collect(Collectors.toSet());

    System.out.println("Union : " + union);
    System.out.println("Intersection : " +intersection);
}

public static void main(String[] args){ 
    List<Set<Integer>> listA = new ArrayList<>();
    List<Set<Integer>> listB = new ArrayList<>();
    listA.stream()
        .forEach(a -> {
            listB.stream()
            .forEach(b -> unionAndIntersection(a, b));
        });
}
Darshan Mehta
  • 30,102
  • 11
  • 68
  • 102
  • 2
    `How to do this using java parallel streams ?` Is this not the question? – Darshan Mehta Mar 13 '19 at 09:35
  • @DarshanMehta Well, the updated answer replaces the two nested `for` loops with two nested `forEach` calls on streams... Not sure how this would perform better – BackSlash Mar 13 '19 at 09:50
  • @BackSlash well, without knowing what the OP wants to do with the result, there is no better alternative. – Holger Mar 13 '19 at 09:54
2

If you ensure that y (and x) are/become sorted, like the class TreeSet, the following uses a special merging (internal method addAllForTreeSet).

for (Set<Integer> x : listA) {
    for (SortedSet<Integer> y : listB) {
        SortedSet<Integer> u = new TreeSet(x);
        u.addAll(y);
        SortedSet<Integer> i = new TreeSet(x);
        i.retainAll(y);
    }
}

Whether this is actually faster I am not sure.

Better would be if the integers are not too wild, limited to say 10_000. If the values are non-negative, one can immediately use a BitSet instead of a Set<Integer>.

This is unbeatable. Use a BitSet constructor with a probable capacity (like 10_000).

for (BitSet x : listA) {
    for (BitSet y : listB) {
        BitSet u = x.clone();
        u.or(y);
        BitSet i = x.clone();
        i.and(y);
    }
}

Ýou might use a parallel stream to save a factor equal to the number of processors.

listA.parallelStream().forEach(x -> {});

That is a secundary optimisation.

Guava I did not use in the last years, did it not have sets of primitive type int?

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
  • It depends on the particular value distribution whether `BitSet` yields to better performance. For a lot of practical use cases, it does. But consider a set containing only zero and `Integer.MAX_VALUE`… – Holger Mar 13 '19 at 09:48
  • @Joop Eggen If you dont mind can you please show me an example of how to convert a set of integers to a bitset ? Assuming my range of integers is from 0 to 15000 – Ram K Mar 13 '19 at 09:49
  • 4
    @Koba `BitSet bs = set.stream().collect(BitSet::new, BitSet::set, BitSet::or);` and `Set set = bs.stream().boxed().collect(Collectors.toSet());` for the other direction. – Holger Mar 13 '19 at 09:49
  • @Holger thanks, you beat me to it, and with streams, mmm. – Joop Eggen Mar 13 '19 at 09:52
  • @Holger thank you again for your help. will try it shortly. – Ram K Mar 13 '19 at 09:55
2

Intersection:

List<T> intersect = list1.stream()
                         .filter(list2::contains)
                         .collect(Collectors.toList());

Union:

List<T> union = Stream.concat(list1.stream(), list2.stream())
                                    .distinct()
                                    .collect(Collectors.toList());  
Ali Azim
  • 160
  • 1
  • 15
1

Worth noting that you don't have to use streams for union as well as for intersection. There is retainAll method that retains only the elements in this collection that are contained in the specified collection:

Set<Integer> setA = new HashSet<>(Arrays.asList(1,2,3));
Set<Integer> setB = new HashSet<>(Arrays.asList(2,3,4));

setA.retainAll(setB);  // now setA has intersection
Ruslan
  • 6,090
  • 1
  • 21
  • 36
  • 2
    This should be a comment. The question is: how to compute union and intersection using parallel streams instead of nested loop? - You aren't answering that question, you are rather giving advice on how to avoid using external libraries to do the job. – BackSlash Mar 13 '19 at 09:48