2

With regard to this post: Python del Statement,

I recently encountered the following snippet:

# custom_process.py

import threading
import subprocess

myList = []  # module-wide list

class Foo(threading.Thread):

    myprocess = None     
    returncode = None

    def run(self):
        self.myprocess = subprocess.Popen(...)

        global myList
        myList.append(self.myprocess)

        ...  # Code skipped for brevity

        self.returncode = self.myprocess.returncode
        tmp1, tmp2 = self.myprocess.communicate()

        ...  # Code skipped for brevity

        del self.myprocess

This code upon successive calls to Foo's run method, would exhaust the available file descriptors on the system and would throw the exception: too many open files.

Therefore I was wondering... When dealing with subprocess objects, do the file descriptors close along with the actual OS process or when the reference count to the Python subprocess object becomes zero?

Thanks in advance.

Community
  • 1
  • 1
stratis
  • 7,750
  • 13
  • 53
  • 94
  • 2
    Why are you storing it in `myList`? – thefourtheye May 20 '14 at 08:32
  • @thefourtheye For no purpose. I now know I shouldn't. However storing in that list resulted in massive errors which now give me the chance to better understand how python works. And that's where you come in. Never mind why I put it there in the place. What I care about is to learn why this was giving me errors. – stratis May 20 '14 at 08:37
  • 1
    @Konos5 - you don't have to declare *myList* as *global* - interpreter will find it in the outer scope if it fails to find it in the local scope. You'll need *global* operator if you use it this way **myList += self.myprocess** - but this is (in most cases) nearly as bad a practice as using *global* – volcano May 20 '14 at 10:15
  • have you tried `close_fds=True` if you are on *nix? – jfs May 21 '14 at 18:08

5 Answers5

2

Yes, del will only remove a reference to the object, it's name so you will. To delete the item from the list you need a different syntax.

>>> a = 10 
>>> l = [a]
>>> del a
>>> a                  # the name a will be gone now
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  NameError: name 'a' is not defined
>>> l                  # but the list will still contain 10
[10]
>>> del l[0]
>>> l                  # but now it is gone
[]
>>> 

Alternatively, you could call l.remove(a).

Hans Then
  • 10,935
  • 3
  • 32
  • 51
  • Good point. However my question now becomes: When dealing with subprocess objects (and not integers as in your example), do the file descriptors close along with the actual os process or when the reference count to the python subprocess object becomes zero? – stratis May 20 '14 at 09:49
  • 1
    The file descriptors associated with the process belong to the subprocess object. It is possible to keep open a file after the original process has died. So you need to clean up the subprocesses. – Hans Then May 20 '14 at 12:06
1

Popen starts a new OS process, and returns a python object that can be used to interact with it.

The two are still separate though, and deleting the python object doesn't necessarily affect the separate running process, it just deletes a convenient way of interacting with it.

It's perfectly reasonable for the python Popen object to stick around after the process has terminated, or vice versa.

The behaviour will depend on the arguments to Popen though, e.g. a process can be configured to keep running even after this entire script terminates.

If it's a one-off process, it should terminate after the communicate() call, but you should check the return code after this call to verify (or use poll() to check whether it's alive).

It may also take some time to terminate, and if you're e.g. calling this function 1000s of times per second, you may be starting them faster than they're terminating, leading to FD exhaustion.

Dave Challis
  • 3,525
  • 2
  • 37
  • 65
  • 1
    Regardless of the rate I am calling this function shouldn't this lead to FD exhaustion anyway? If the returned python object to interact with the process stays indefinitely in `myList`, shouldn't the fds stay open as well or should they close when the OS process returns? – stratis May 20 '14 at 09:34
  • 1
    Again, I think it might depend on args to Popen, but to be sure of avoiding any FDs staying open, you can call e.g. `myprocess.stderr.close(); myprocess.stdout.close(); myprocess.terminate()` - that should (if I recall correctly) ensure all FDs used by the object are closed, then can have 1000s of those hanging around in a list without issue. – Dave Challis May 20 '14 at 10:09
  • 1
    Tested out your suggestion and works fine. Therefore I conclude that I can either let `myprocess` finish without assigning it to any lists/variables which will eventually implicitly close its FDs or I can also assign it to lists/variables but before deleting its reference from my namespace I should explicitly close its FDs – stratis May 20 '14 at 10:47
1

I found an excellent blog post describing exactly the situation I was facing:

How subprocess and file descriptors work in Python

Breaks down the problem intuitively by emphasizing the differences in file handling between *nixes and Windows. The author even proposes a solution! Definitely worth a read.

stratis
  • 7,750
  • 13
  • 53
  • 94
0

The best way to avoid fd exhaustion, while keeping a list of processes is to use a process Pool. That way you can limit the number of processes opened simultaneously, a new process starting only when a previous process is over.

https://docs.python.org/3.4/library/multiprocessing.html#using-a-pool-of-workers

Of course, both other answers are right about reference deleting and why you get to memory exhaustion, which is why I'm not repeating it here.

zmo
  • 24,463
  • 4
  • 54
  • 90
0

You may use myprocess.kill() or myprocess.terminate() to close the running process - if you do not need it when you exit the run() method. (your snippet implies that you don't)

If you also don't add it as an attribute and don't add it to myList (by your code, it is redundant), it will be destroyed soon after you exit run() - in Python, you usually count on garbage collector and actual use of del operator is rare

volcano
  • 3,578
  • 21
  • 28