Delay in application submission from Oozie and Yarn

Question

We are running a Oozie workflow which has Shell action and a Spark action which means a shell script and a Spark job which runs in sequence.

Running single workflow:

Total: 3 mins
Shell action: 50 secs
Spark job: 2 mins
The rest of the time is gone in initializing from oozie and allocating containers from yarn which is absolutely fine.

Usecase: We are suppose to run 700 instances of the same workflow at once( by region, zone and area, which is a business case).

When running the 700 instances of the same workflow we are noticing delay in completion of 700 workflows although we have scaled the cluster linearly. We are expecting 700 workflows to complete in 3 mins or atleast by 5mins but this is not the case. There is a delay of 5mins to launch all the 700 workflows which is fine too by that it should complete by 10mins but it is not the case.

What exactly is happening is that when 700 workflows are submitted it is taking arond 5-6 mins to launch all the workflows from ooize (we are ok with this). The overall time taken to complete 700 workflows is around 30 mins which means some workflows which kickstarted at 7:00 would complete at 7:30. But the time taken by actions remains same which means shell action still take 50s and spark job is taking 2-3mins to complete. Noticing delay in starting the shell action and spark job although oozie has taken the workflow into the prep state.

What we checked so far:

Initially we thought it is to do with Oozie and worked on the configurations.
Later we thought Yarn and tuned some configurations.
Also, did create queue to run shell and launcher jobs in one queue and spark jobs in another queue.
We have gone through yarn and oozie logs too.

Can someone throw somelight around this?

You need to do memory management in you spark applications. Do you know the following, if yes please answer: (a) cluster size and configuration (b) how much memory is allocated to spark (c) no of cores, executors used along with driver and executor memory — swapnil shashank, Sep 04 '21 at 17:56
If you are running spark on `client` mode, spark driver will consume memory on your master node. Which will restrict Oozie to create more no of launcher. Try running spark job on `cluster` mode. [Read more on spark client vs cluster mode](https://stackoverflow.com/questions/41124428/spark-yarn-cluster-vs-client-how-to-choose-which-one-to-use) — Snigdhajyoti, Sep 05 '21 at 20:56
Thanks, for the response. we are using auto scaling cluster with core nodes and task nodes. We also tried with fixed nodes. The cluster scaling is not showing any anomalies. Almost 100gig of mem is allocated to Spark in a given node. I'm submitting the application with 4 cores, executor mem of 12g, no of executors 4. I'm submitting Spark application in 'cluster' mode. — ravi, Sep 05 '21 at 21:59

Delay in application submission from Oozie and Yarn

0 Answers0