We are running a pipeline in GCP Dataflow, and run into the max message size of a pubsub message [1] When this happens, the pipeline lag time will start to build up, eventually grinding to a halt...
This log message was produced in GCP stackdriver under 'dataflow_step',
My question, is there a way to define error handling in the pipeline...
.apply(PubsubIO.writeMessages()
.to("topic")
.withTimestampAttribute(Instant.now().toString()));
with something like
.onError(...perform error handling ...)
In a similar fluent manner as the Java8 streams api. which would allow the pipeline to continue with outputs which are within the pubsub limits.
Other solutions to deal with this situation are most welcome.
Thank You , Christophe Bouhier
[1] Could not commit request due to validation error: generic::invalid_argument: Pubsub publish requests are limited to 10MB, rejecting message over 7MB to avoid exceeding limit with byte64 request encoding.