2

I'm doing my best to close and clean up Queues when I'm done using them in order to collect output from a Process in Python's multiprocessing module. Here's some code which dies at some point due to "too many open files". What more can I do to clean up complete jobs/queues so that I can do as many as I like?

# The following [fails to] demonstrates how to clean up jobs and queues (the queues is key?) to avoid the OSError of too many files open. 
def dummy(inv,que):
    que.put(inv)
    return(0)
from multiprocessing import Process, Queue, cpu_count
nTest=2800
queues=[None for ii in range(nTest)]
for ii in range(nTest):
    queues[ii]=Queue()
    job=Process(target=dummy, args=[ii,queues[ii]])
    job.start()
    print('Started job %d'%ii)
    job.join()
    print('Joined job %d'%ii)
    job.terminate()
    print('Terminated job %d'%ii)
    queues[ii].close()

Because it's an OSError, there is no specific line in my code which causes the problem. The report looks like this:

...
Terminated job 1006
Started job 1007
Joined job 1007
Terminated job 1007
Started job 1008
Joined job 1008
Terminated job 1008
Started job 1009
Joined job 1009
Terminated job 1009
        ---------------------------------------------------------------------------

OSError                                   Traceback (most recent call last)
<ipython-input-2-5f057cd2fe88> in <module>()
----> 1 breaktest()

... in breaktest()

/usr/lib64/python2.6/multiprocessing/__init__.pyc in Queue(maxsize)

/usr/lib64/python2.6/multiprocessing/queues.pyc in __init__(self, maxsize)

/usr/lib64/python2.6/multiprocessing/synchronize.pyc in __init__(self)

/usr/lib64/python2.6/multiprocessing/synchronize.pyc in __init__(self, kind, value, maxvalue)

OSError: [Errno 24] Too many open files
> /usr/lib64/python2.6/multiprocessing/synchronize.py(49)__init__()
CPBL
  • 3,783
  • 4
  • 34
  • 44
  • Which line raises the error? – Finch_Powers Nov 13 '15 at 14:08
  • I added some information which hopefully is meaningful for telling which line raises the error. – CPBL Nov 13 '15 at 14:17
  • Are you sure you don't have... too many open files? Are you opening files but forgetting to close them? – Thomas Nov 13 '15 at 14:18
  • Do you have a sequence of print "Started" / "Join" / "Terminated" ? On my system if I set nTest to a high value it crashes over [None for ii in range(nTest)] with "MemoryError". – Finch_Powers Nov 13 '15 at 14:23
  • Sorry. Key context missing (now added). Yes, I get those lines. On one computer, it dies every time after 1000 (or so!?) jobs. On my laptop it dies after 505 (or so!?) jobs. There are no other files being opened other than whatever is a "file" in this code. – CPBL Nov 13 '15 at 14:58
  • I noticed you are using python 2.6. Have you tried with 2.7? Also, your queues are still referenced in the queues list. Try removing them from there to have them go out of scope / no more reference on them. – Finch_Powers Nov 13 '15 at 15:14
  • Yes, It's 2.7.9 on the other machine. How shall I remove them? I have already tried del queues[ii] – CPBL Nov 13 '15 at 15:15
  • del will break your ranges. Use "queues[ii] = None". – Finch_Powers Nov 13 '15 at 15:30
  • @Finch_Powers, your solution (=None) appears to work in my toy example, above, so that seems like the answer. (However, it does not solve the problem in my slightly more involved code.) – CPBL Nov 13 '15 at 23:31

2 Answers2

2

Your script gets stuck after ~1000 tasks because that is the default limit of file descriptors for a single process.

Queues are implemented with Pipes which hold file descriptors. The Pipes are properly deleted via GC. As you store them in a list they don't get garbage collected and the file descriptors leak until your process does not allocate 1024 of them and then, it crash.

Do you have any need for storing the Queues within a list?

noxdafox
  • 14,439
  • 4
  • 33
  • 45
2

Simply replacing queues[ii].close() with queues[ii] = None in the code of the problem statement avoids the error shown (thanks to @Finch_Powers in the comments).

However, I had more related problems (which would be a separate question), and the more general solution for my real problem (which motivated the toy example in my post) was to be careful to avoid having any loop variables refer directly to the queues, or any object containing them. That, combined with setting the list element to None when I'm done with the queue, possibly combined with manually calling gc.collect(), results in proper destruction (garbage collection) of each queue when I'm done with each one. See python multiprocessing: some functions do not return when they are complete (queue material too big)

The actual code where this helped is the runFunctionsInParallel function in https://gitlab.com/cpbl/cpblUtilities/blob/master/parallel.py

CPBL
  • 3,783
  • 4
  • 34
  • 44
  • current url: https://gitlab.com/cpbl/cpblutilsthree/blob/master/parallel.py – Cícero Alves Jun 24 '19 at 17:30
  • @CíceroAlves: Thank you! Actually, I've moved the (Python 3 version of the) repo back to where it was in my original URL, so it is correct as listed in cpblUtilities. – CPBL Jun 25 '19 at 14:18
  • 1
    It would be much more helpful if you showed a minimal example that solves this problem, the gitlab example does a lot and its tough to understand which part of the code exactly solves this problem – Preethi Vaidyanathan Jan 27 '21 at 15:41
  • 1
    Thanks @P.V.: I edited the answer so that it starts off with the explicit solution. I also linked to a more general issue/question. – CPBL Jan 27 '21 at 22:51