0

I am attempting to implement a solution where I need to write data (json) messages from pubsub into GCS using dataflow. My question is exactly similar to this one

I need to write either based on windowing or element count. Here is the code sample for writes from the the above question:

windowedValues.apply(FileIO.<String, String>writeDynamic()
        .by(Event::getKey)
        .via(TextIO.sink())
        .to("gs://data_pipeline_events_test/events/")
        .withDestinationCoder(StringUtf8Coder.of())
        .withNumShards(1)
        .withNaming(key -> FileIO.Write.defaultNaming(key, ".json")));

The solution suggests using FileIO.WriteDynamic function. But i am not able to understand what .by(Event::getKey) does and where it comes from. Any help on this is greatly appreciated.

1 Answers1

0

It's partitioning elements into groups according to events' keys.

From my understanding, the events come from a PCollection using the KV class since it has the getKey method.

Note that :: is a new operator included in Java 8 that is used to refer a method of a class.

Héctor Neri
  • 1,384
  • 9
  • 13