0

I am getting confused with Spark jobs, Stages, Tasks.

I understand that two stages and tasks can run in parallel. But since all my development has been on standalone cluster I have this doubt if Spark can run two jobs in parallel. Because when I open Event timeline on Jobs page I never see two jobs running parallel/overlapping. Thanks!

Amit Kumar
  • 2,685
  • 2
  • 37
  • 72
  • 2
    As long as you cluster have free resources you can run as much jobs as you want in prallel. How jobs compete for the resources depends on cluster configuration. – abalcerek Feb 21 '18 at 13:59
  • Can this be done on a standalone cluster running in pseudo-distributed mode? – Amit Kumar Feb 21 '18 at 14:01
  • The answer to your later question is also in @abalcerek 's comment, even though it doesn't make much sense and it's even bad practice. So unless you have a resources manager, there is no purpose in doing so. – eliasah Feb 21 '18 at 14:08
  • see also https://stackoverflow.com/questions/48838380/how-can-i-parallelize-multiple-datasets-in-spark/48845764#48845764 – Raphael Roth Feb 21 '18 at 20:24

1 Answers1

1

Yes, You can run two jobs in parrallel in standalone mode. Basically it's require memory. If your server have 8 gb memory then you have to set max allocation up to 3 gb so whenever you are going to run second job your server can allow you to run that job because your second job need 3 gb memory and your sever have 4-5 gb free memory. in case your server haven't free memory then your second job goes in to pending mode.

Sahil Desai
  • 3,418
  • 4
  • 20
  • 41