1

I am using Glue v3.0 (Python 3.7).

The goal is to stop the Glue PySpark job gracefully so that it can terminate properly and clean its custom resources before ending.

I tried to catch the stop/abort/kill signal, using the classical signal.signal() method with its handler.

However, it doesn't seem to work with the signals specified in the python signal's documentation (the handler never get called).

Any idea of how I can stop the job gracefully, using signal method or not ?

dng
  • 411
  • 1
  • 5
  • 18
  • I have provided an updated answer to a similar question here:: [https://stackoverflow.com/a/72528222/16449395](https://stackoverflow.com/a/72528222/16449395) – jay-dono Jun 07 '22 at 09:52

1 Answers1

1

You can call sys.exit(any_status_code). Just make sure that sys.exit is called before job.commit(), otherwise the job will fail.

Just out of curiosity, why would you want to terminate your job by yourself? After you call job.commit() the job should end anyways.


Edit:

Calling this should work:

job.commit()
os._exit()
Robert Kossendey
  • 6,733
  • 2
  • 12
  • 42
  • Hi, could you detail a little bit more please because I don't understand why I should use sys.exit and job.commit (I have none of them right now). To answer your question, I need that scenario : When I click on "stop" the glue job, i don't want it to quit right after that while it was still processing. Instead, I would like the job to finish what it was doing and leave a clean env (no tmp files etc...). More widely, this glue job will be use in an AWS Step functions to run multiple instances of it in parallel, and so the job can be stopped from the step-functions when running on error. – dng Sep 15 '21 at 12:54
  • 1
    Okay, this is not how Glue works. If you stop a job, it stops. Also Glue is completely serverless. There are no tmp files etc. The job is running inside of a container. Job.commit you need if you want to use the bookmark feature of Glue. – Robert Kossendey Sep 15 '21 at 13:05
  • Thanks for your answer Robert, I know there are no tmp files but there are in what I am doing (my glue job is processing files and creates some tmp files) I don't think I need the bookmark feature of Glue since I just want the job to end properly (after I finish what I want to do). – dng Sep 15 '21 at 13:08
  • Okay, then you need to have something that listens to the events of the Glue Job and deletes the files on Failed State. The Glue Job can not delete them by itself – Robert Kossendey Sep 15 '21 at 13:09
  • I see... So what I understand is : there is no way to end a Glue Job gracefully (when running alone, not through step-functions) :( – dng Sep 15 '21 at 13:15