1

I have many spark 1.6 applications which I want to run one after another (as they read/write from the same hive tables) using YARN

I tried specifying a common queue using spark-submit --queue QUENAME ... but the applications still run in parallel.

is there another way to ensure only 1 application runs at the same time (other than using a loop e.g. a bash-script?)

Raphael Roth
  • 26,751
  • 15
  • 88
  • 145
  • wanting to run **cluster computing** jobs sequentially pretty much defeats the purpose of the framework ... you dont need apache spark for sequential execution, that doesnt make any sense. Write a bash script and be happy - thats what bash scripts are made for. – specializt Dec 12 '16 at 11:04
  • @specializt Thanks for the comment. I agree in general, but I have reasons to do it sequentially (of course all my individual spark-applications run distributed), I just don't want them to run all at the same time (for which I have good reasons) – Raphael Roth Dec 12 '16 at 11:11
  • then collect all the jobs onto one single machine and run it, [RDDs preserve order](http://stackoverflow.com/a/29301258/351861) – specializt Dec 12 '16 at 11:17
  • 1
    why can't you run as oozie workflow job where each spark action can run sequentially – Nirmal Ram Dec 12 '16 at 11:25
  • 1
    This might help you https://community.hortonworks.com/questions/25580/scheduling-a-spark-submit-job-using-oozie.html – BruceWayne Dec 12 '16 at 11:36

0 Answers0