Timing a spark process and killing it if its too slow

Question

I am building a process for launching user built queries (business rules) using Scala-Spark/SQL. One of the requirements is that if the SQLs perform slower than expected (every rule has an expected performance (time in seconds) attribute), I need to flag them as such for future references, as well as kill the long running (slow) process/job,

So far I am thinking of the following approach -

Start timing
launch job in a scala future - thread
wait for time for the job
if the thread hasnt completed in the time expected kill the job and report it as a slow process

I am concerned that I am fiddling with the distributed nature of the job. Another concern is that, for my "job" (that of running that query), spark internally will launch an unknown number of tasks across nodes, how will the timing process work, what kind of actual performance shall be reported back to my program!!

Suggestions please..

score 1 · Answer 1 · edited May 23 '17 at 12:18

I suggest different approach: build streaming / scheduled batch application that updates state to DB upon new input data arrival, then provide rest api to access that state according to query range required by client. From my experience allowing clients to launch series of spark jobs will expose you to huge operational overhead while managing their performance and volume -> effect of cluster resources. It is easier to tune and monitor and productionise your queries re: partitions, cores / executor nos - optimal cluster resources and manage query rest api. In case, this is not suitable for you: build rest api by allowing user to launch own spark - job per each query, examples: spark hidden api 1, spark hidden api 1 , spark job server, and then build app that monitors spark ui and kills -> relaunches if it is too long, you can use that script as example kill spark job via spark ui. Your approach planned might very hard to execute as spark launches multiple future jobs itself, from which many are lazy implemented, and timing each stage of execution it is pretty hard, perhaps you can use future to launch spark job per client query and monitor its length? I hope it helps

Appreciate the idea, however, my case does need me to be launching these rules in spark processes. The users are not starting, its part of a stream being processed.. — Raghav, May 02 '17 at 13:11

Timing a spark process and killing it if its too slow

1 Answers1