1

I am trying to execute a spark job using an ssh connection on a remote location.

There have been instances where the job failed but the scheduler marked it as "success" so i want to check the return code of spark-submit so i could forcefully fail it.

Below is the code I'm using

def execute_XXXX():
    f = open('linux.pem','r')
    s = f.read()
    keyfile = StringIO.StringIO(s)
    mykey = paramiko.RSAKey.from_private_key(keyfile)
    sshcon   = paramiko.SSHClient()
    sshcon.set_missing_host_key_policy(paramiko.AutoAddPolicy())
    sshcon.connect('XXX.XXX.XXX.XXX', username='XXX', pkey=mykey)
    stdin, stderr, stdout= sshcon.exec_command('spark-submit XXX.py')

    logger.info("XXX ------>"+str(stdout.readlines()))
    logger.info("Error--------->"+str(stderr.readlines()))

How do I get the return code for the spark-submit job so I can forcefully fail the task. Or could you suggest an alternate solution.

Thanks, Chetan

Chetan J
  • 1,847
  • 5
  • 16
  • 21

2 Answers2

1

So this is how I solved the issue i was facing. A simple 1 line code was enough.

def execute_XXXX():
    f = open('linux.pem','r')
    s = f.read()
    keyfile = StringIO.StringIO(s)
    mykey = paramiko.RSAKey.from_private_key(keyfile)
    sshcon   = paramiko.SSHClient()
    sshcon.set_missing_host_key_policy(paramiko.AutoAddPolicy())
    sshcon.connect('XXX.XXX.XXX.XXX', username='XXX', pkey=mykey)
    stdin, stderr, stdout= sshcon.exec_command('spark-submit XXX.py')
    if (stdout.channel.recv_exit_status())!= 0:
         logger.info("XXX ------>"+str(stdout.readlines()))
         logger.info("Error--------->"+str(stderr.readlines()))
         sys.exit(1)
Chetan J
  • 1,847
  • 5
  • 16
  • 21
0

You need to implement a sparkListener. More information to which can be found on the below link.

How to implement custom job listener/tracker in Spark?

Community
  • 1
  • 1
Sanchit Grover
  • 998
  • 1
  • 6
  • 9