0

I have 2 PCollections:

PCollection<List<String>> ListA =
        pipeline.apply("getListA", ParDo.of(new getListA()))
PCollection<List<String>> ListB =
        pipeline.apply("getListB", ParDo.of(new getListB()))

ListA contains:

["1","2","3"]

ListB contains:

["A","B","C"]

How do I end up with a PCollection that contains:

[
 ["A","1"],["A","2"],["A","3"],
 ["B","1"],["B","2"],["B","3"],
 ["C","1"],["C","2"],["C","3"],
]

My search has pointed me to:

How to do a cartesian product of two PCollections in Dataflow?

But this is dealing with KV using coGroupby with 2 outputs. It's possible that coGroupby can be used to create the cartesian product of 2 lists but I am not seeing it.

mattwilsn
  • 188
  • 1
  • 17

2 Answers2

1

It looks like you have a single element in each PCollection, so you just need to join those elements, and then you can do the cartesian product yourself in a DoFn

Something like

Flatten.pcollections(ListA, List)
.apply(WithKeys.of(null))
.apply(GroupByKey.create())

After that, you'll have a PCollection with a single element, which is a KV(null, Iterable(ListA, ListB)), and you can generate the cartesian product with some for loops.

danielm
  • 3,000
  • 10
  • 15
0

You can use Java 8 Stream: map and reduce methods as follows.

Try it online!

List<String> listA = Arrays.asList("1", "2", "3");
List<String> listB = Arrays.asList("A", "B", "C");
List<List<String>> cartesianProduct = Stream.of(listA, listB)
        // represent each list element as a singleton list
        .map(list -> list.stream().map(Collections::singletonList)
                // Stream<List<list<String>>>
                .collect(Collectors.toList()))
        // intermediate output
        //[[1], [2], [3]]
        //[[A], [B], [C]]
        .peek(System.out::println)
        // summation of pairs of inner lists
        .reduce((list1, list2) -> list1.stream()
                // combinations of inner lists
                .flatMap(inner1 -> list2.stream()
                        // merge two inner lists into one
                        .map(inner2 -> Stream.of(inner1, inner2)
                                .flatMap(List::stream)
                                .collect(Collectors.toList())))
                // list of combinations
                .collect(Collectors.toList()))
        // returns List<List<String>>, otherwise an empty list
        .orElse(Collections.emptyList());
// final output
System.out.println(cartesianProduct);
//[[1, A], [1, B], [1, C], [2, A], [2, B], [2, C], [3, A], [3, B], [3, C]]

See also: How can I make Cartesian product with Java 8 streams?

Community
  • 1
  • 1