0

I have short python script that is supposed to:

  • be used by many users and threads concurrently,
  • call another program (/usr/bin/ld)
  • call this other program not more than x times concurrently (e.g. 2 concurrent calls to ld)
  • handle being interrupted / killed

I managed to achieve most of this using shared semaphore from python module posix_ipc. It handles SIGTERM and ctrl+c - semaphore is released, but it doesn't handle SIGKILL - semaphore stays acquired and needs to be reset manually. This means that doing kill -9 on it twice disables it permanently (until manual fix is applied).

How can I release semaphore when script is killed? If not possible - is there different method to achieve similar result?

I looked into file locks (with assumption that number of concurrent uses will always be 2) - maybe I can have 2 files, try to lock 1, if failed lock the other and wait until available. But I couldn't figure how to do "try to lock, if sb else already locked it, do sth else".

Full code of script:

#!/usr/bin/env python3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
# -*- coding: utf-8 -*-                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            

import posix_ipc                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
import subprocess                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
import sys                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
import signal                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      

SEM_NAME = '/serialize_ld'                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
MAX_CONCURRENT = 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
PROGRAM = '/usr/bin/ld'                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            


def main():                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
    import os                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
    os.umask(0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
    sem = posix_ipc.Semaphore(SEM_NAME, posix_ipc.O_CREAT, mode=0o666, initial_value=MAX_CONCURRENT)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
    sem.acquire()                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  

    def release_semaphore(signum, frame):                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
        print("exiting due to signal " + str(signum))                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
        sem.release()                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
        sem.close()                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
        sys.exit(1)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
    signal.signal(signal.SIGTERM | signal.SIGINT | signal.SIGKILL, release_semaphore)                                                                                                                                                                                                                                                                                                                                                                                                                                                              

    try:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
        subprocess.call([PROGRAM, *sys.argv[1:]])                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
    finally:
        sem.release()
        sem.close()


if __name__ == "__main__":
    main()
MateuszL
  • 2,751
  • 25
  • 38
  • From the Wiki on [signals](https://en.wikipedia.org/wiki/Signal_(IPC)): *The SIGKILL signal is sent to a process to cause it to terminate immediately (kill). In contrast to SIGTERM and SIGINT, this signal cannot be caught or ignored, and the receiving process cannot perform any clean-up upon receiving this signal.* There's nothing you can do from inside your process. – shmee Mar 27 '19 at 09:18
  • @shmee yes, I know. So I am open to other mechanisms that don't have problem with this. – MateuszL Mar 27 '19 at 09:21
  • This is why people should not use sigkill as first choice – geckos Mar 27 '19 at 10:05
  • You can try to apply the manual intervention in the program, land have some means to determine if the program was killed abruptly – geckos Mar 27 '19 at 10:09
  • What are the manual means? – geckos Mar 27 '19 at 10:09
  • @geckos `sudo rm /dev/shm/sem.serialize_ld` (sudo not always needed) – MateuszL Mar 27 '19 at 11:18
  • @MateuszL You could have a small independent instance that manages the semaphore and a pid list, that your script asks for permission to run and reports back to when finished. The instance could increase and decrease the semaphore on behalf of your script and, if asked for run permission while the limit is reached, could check if the processes behind registered PIDs from previous calls are still running. If any of them is no longer found, the PID list could be updated with the PID from the new script execution and the caller could be granted permission to execute. – shmee Mar 27 '19 at 11:50
  • Do you really need sudo? Does the script runs with sudo? If the script runs with enough permission, you can remove the file at the script, otherwise you have to appeal to the @MateuszL option, which would add much more complex that the needed. – geckos Mar 27 '19 at 12:52
  • @geckos When user A runs the script first, semaphore is created with owner A. Because `/dev/shm` has sticky bit, only A can remove it. But for my use case using sudo for this is not a problem; if it becomes a problem, maybe I can get rid of sticky bit – MateuszL Mar 27 '19 at 12:57
  • I swap the names, sorry. I would just delete try to delete the file, and if if failed, print the proper line like `surm rm /dev/...` in the error, so that the user has a quick fix for the problem. There shouldn't be a lot of `kill -9 ..` anyway. If so, you have another problem that is causing people to shut your script down with such violence. :) – geckos Mar 27 '19 at 13:02
  • Adding server/client pid/shared semaphore manager looks like overkill to me – geckos Mar 27 '19 at 13:03
  • @geckos My proposal took into account, that no more than two instances of the script should be running at the same time. While I agree, that processes in general should not be `kill -9`ed; your approach to have either the script or the user delete the semaphore if it cannot be acquired disregards the possibility, that the semaphore might be unavaibale for good reasons, i.e. two running processes. It makes the use of a semaphore rather pointless. If you have a solution that keeps the premise and is not as overly complicated and overkill as mine, I'd be happy to learn from you ;) – shmee Mar 27 '19 at 14:19
  • You're right, I was treating the semaphore as a mutex, removing it would break running scripts. It still seems overkill for me implementing client/server seems great overkill here. The semaphore can be asked for use by `fuser ` or `lsof` commands. Process being killed by -9 should be like 0.01% of the cases. – geckos Mar 27 '19 at 16:28
  • https://stackoverflow.com/questions/2053679/how-do-i-recover-a-semaphore-when-the-process-that-decremented-it-to-zero-crashe OP, see this thread. They talk about using file locks instead of semaphores, so that if a process dies, the lock is removed by the operating system. – geckos Mar 27 '19 at 16:31
  • In your case you will have to use 2 files, since you want two executions concurrently which is a dirty dirty workaround, but it will help you with the release – geckos Mar 27 '19 at 16:34

0 Answers0