But I have a question. How can I configure backpressure to my job when I want to use TriggerOnce?
In spark 2.4 I have a use case, to backfill some data and then start the stream.
So I use trigger once, but my backfill scenario can be very very big and sometimes create too big a load on my disks because of shuffles and to driver memory because FileIndex cached there.
SO I use max maxOffsetsPerTrigger
and maxFilesPerTrigger
to control how much data my spark can process. that's how I configure backpressure.
And now you remove this ability, so assume someone can suggest a new way to go?