1

I have a script that has been working properly for the past 3 months. The Server went down last Monday and since then my script stopped working. The script hangs at coords = p.communicate()[0].split().

Here's a part of the script:

class SelectByLatLon(GridSelector):
def __init__(self, from_lat, to_lat, from_lon, to_lon):
self.from_lat = from_lat
self.to_lat = to_lat
self.from_lon = from_lon
self.to_lon = to_lon

def get_selection(self, file):
p = subprocess.Popen(
        [
    os.path.join(module_root, 'bin/points_from_latlon.tcl'), 
    file, 
    str(self.from_lat), str(self.to_lat), str(self.from_lon), str(self.to_lon)
    ],
        stdout = subprocess.PIPE
    )
    coords = p.communicate()[0].split()
    return ZGridSelection(int(coords[0]), int(coords[1]), int(coords[2]), int(coords[3]))   

When I run the script on another server everything works just fine. Can I use something else instead of p.communicate()[0].split() ?

jfs
  • 399,953
  • 195
  • 994
  • 1,670
MrGRafael
  • 55
  • 1
  • 1
  • 5
  • Looks like your tcl script is what is hanging. Fix that. – martineau Jul 15 '13 at 14:12
  • Does it 'hang' infinitely long on `communicate()`, i.e. does the subprocess just not exit (you should monitor that)? "Different" servers usually implicates that many parts of the environment the program runs in are different. It could be that the (subprocess) program hangs because it expects input from stdin. Try opening a pipe to stdin via `stdin=subprocess.PIPE` and provide some input to the subprocess (e.g. a newline) via `p.communicate("\n")`. If that helps, we can later figure out what exactly triggered this difference. – Dr. Jan-Philip Gehrcke Jul 15 '13 at 14:13
  • Martineau you are correct. the TCL script is causing the problem. I have no idea why. the same script has been working properly for the past 3 months. ill try to figure it out. – MrGRafael Jul 15 '13 at 15:15

1 Answers1

1

You might have previously run your server without daemonization i.e., you had functional stdin, stdout, stderr streams. To fix, you could redirect the streams to DEVNULL for the subprocess:

import os
from subprocess import Popen, PIPE

DEVNULL = os.open(os.devnull, os.O_RDWR)
p = Popen(tcl_cmd, stdin=DEVNULL, stdout=PIPE, stderr=DEVNULL, close_fds=True)
os.close(DEVNULL)

.communicate() may wait for EOF on stdout even if tcl_cmd already exited: the tcl script might have spawned a child process that inherited the standard streams and outlived its parent.

If you know that you don't need any stdout after the tcl_cmd exits then you could kill the whole process tree when you detect that tcl_cmd is done.

You might need start_new_session=True analog to be able to kill the whole process tree:

import os
import signal
from threading import Timer

def kill_tree_on_exit(p):
    p.wait() # wait for tcl_cmd to exit
    os.killpg(p.pid, signal.SIGTERM)

t = Timer(0, kill_tree_on_exit, [p])
t.start()
coords = p.communicate()[0].split()
t.cancel()

See How to terminate a python subprocess launched with shell=True

Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670