Is there a way to determine total job parallelism or number of slots required to run a Flink job(before it is run)

Question

Is there a way to determine the total number of task slots that will be required to run the job from either the execution plan or in some other way without having to actually start the job first.

According to this doc: https://ci.apache.org/projects/flink/flink-docs-stable/concepts/runtime.html

"A Flink cluster needs exactly as many task slots as the highest parallelism used in the job. No need to calculate how many tasks (with varying parallelism) a program contains in total."

If I get the execution plan from StreamExecutionEnvironment(after setup but without actually executing the job) and get the max parallelism for any node from the list of nodes in the execution plan json, would that be sufficient to determine the number of task slots required to run the job.

Are there any situations where this ceases to be the case? Or any caveats to keep in mind?

score 3 · Answer 1 · answered Sep 10 '19 at 17:30

In the general case, one can compute the required number of slots for a given Flink job the following way: For every slot sharing group g (denoting a group of operators which can be deployed into the same slot), one needs to find the operator with the maximum parallelism p_max_g. Now one needs to add these numbers up for every slot sharing group in the job slots = sum_(g in G) p_max_g in order to obtain the number of required slots.

In most cases (if the user has not set any slot sharing groups), then there should only exist one slot sharing group G = {g}. This entails that Flink can deploy one subtask of every operator into a one and the same slot.

One special case are batch jobs (bounded streams) if they use blocking data exchanges. In this case one can run the different slot sharing groups (given that they align with the blocking data exchanges/operator edges) sequentially one after the other.

Unfortunately, ExecutionEnvironment.getExecutionPlan does not print the slot sharing group of an operator. Hence, calculating the required number of slots based on the stringified execution plan only works if there is a single slot sharing group.

Is there a way to determine total job parallelism or number of slots required to run a Flink job(before it is run)

1 Answers1