I'm using the default dataflow template GCS to Pub/Sub. input files in cloud storage having size 300MB and 2-3 millions of rows each one.
when launching the dataflow batch job the following error is raised
Error message from worker: javax.naming.SizeLimitExceededException: Pub/Sub message size (1089680070) exceeded maximum batch size (7500000) org.apache.beam.sdk.io.gcp.pubsub.PubsubIO$Write$PubsubBoundedWriter.processElement(PubsubIO.java:1160)
from the documentation : Pub/Sub accepts a maximum of 1,000 messages in a batch, and the size of a batch can not exceed 10 megabytes.
does it mean that i have to split input files to 10MB chunks or 1000 message to publish?
what is the recommended way to load such large files(300MB each one) to pubsub ?
Thanks in advance for your help.