1

I want to join the items of two lists respectively.

Here is my code :

1.

List<String> pairs = list1.stream()
                          .parallel()
                          .flatMap(item1 -> list2.stream()
                                                 .parallel()
                                                 .map(item2 -> item1 + " " + item2))
                          .collect(Collectors.toList());

When I try to similar methods, above method is fastest.. But I feel like this method is not parallel. (because the order of result is always same..!)

Is there any faster way? The order of the final result list does not matter.

Thanks!

=======================================

I tried 2 more methods

2.

List<String> pairs = new ArrayList<>();
    for(String item1 : list1)
        for(String item2 : list2)
            pairs.add(item1 + " " + item2);
    pool.submit(() -> {
        List<String> pairs = list1.stream()
                  .parallel()
                  .flatMap(item1 -> list2.stream()
                                         .parallel()
                                         .map(item2 -> item1 + " " + item2))
                  .collect(Collectors.toList());
    }).get();
sjpark
  • 111
  • 9

2 Answers2

1

The issue of getting a sequential stream is in flatMap. Here is the article about efficiently splittable streams.

There are several ways to combine two streams or more respectively. One of the ways is using Guava Streams:

Streams
  .zip(list1.stream(), list2.stream(), (item1, item2) -> item1 + ":" + item2)

Be aware that this Stream also is not efficiently splittable. So it will harm parallel performance. More ways of doing that you can find here:

Zipping Collections in Java

Uladzislau Kaminski
  • 2,113
  • 2
  • 14
  • 33
  • Thank you for your answer. but I want to join of all items (n x m) I will study the guava stream, thanks! – sjpark Jun 18 '21 at 09:03
  • 1
    @sjpark I've got you. It is called Cartesian Product in math. https://rosettacode.org/wiki/Cartesian_product_of_two_or_more_lists#Java – Uladzislau Kaminski Jun 18 '21 at 09:10
1

But I feel like this method is not parallel. (because the order of result is always same..!)

Your reasoning here is incorrect. toList() can return items in order after processing them in parallel. From the docs:

In cases where the stream has an encounter order, but the user does not particularly care about that encounter order, explicitly de-ordering the stream with unordered() may improve parallel performance for some stateful or terminal operations. However, most stream pipelines, such as the "sum of weight of blocks" example above, still parallelize efficiently even under ordering constraints.

For more detail on how that works, we can look at the implementation of Collectors.toList:

    public static <T>
    Collector<T, ?, List<T>> toList() {
        return new CollectorImpl<>(ArrayList::new, List::add,
                                   (left, right) -> { left.addAll(right); return left; },
                                   CH_ID);
    }

Each thread processing the stream creates a separate ArrayList, and adds the processed elements to the list in order with add. Later these separate lists are merged using addAll. Since each thread processes an ordered batch of items, and the merge process preserves the order of batches, overall order is preserved.

MikeFHay
  • 8,562
  • 4
  • 31
  • 52