def func(item, protein, ncpu):
output = None
item_id = item.id
output_fname = tempfile.mkstemp(suffix='_output.json', text=True)[1]
input_fname = tempfile.mkstemp(suffix='_input.pdbqt', text=True)[1] # <-- error occurs here
try:
with open(input_fname, 'wt') as f:
f.write(preprocess(item)) # <- convert item to text format, not important
python_exec = sys.executable
cmd = f'{python_exec} script.py -i {input_fname} -p {protein} -o {output_fname} -c {ncpu}'
subprocess.run(cmd, shell=True)
with open(output_fname) as f:
res = f.read()
if res:
res = json.loads(res)
output = {'score': res['score'],
'block': res['poses']}
finally:
os.unlink(input_fname)
os.unlink(output_fname)
return item_id, output
with Pool(ncpu) as pool:
for item_id, res in pool.imap_unordered(partial(func, **kwargs), tuple(items), chunksize=1):
yield item_id, res
I process multiple items using multiprocessing.Pool
. For every item I run a python script from subprocess
shell. Before, I created two temporary files and pass them as arguments to the script. The script.py
calls C-extension which process an item. After, I parse the output json file and return values, if any. Temporary files should be destroyed. in a finally
section. However, after I process 3880-3920 items I got an error:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/pavlop/anaconda3/envs/vina_cache/lib/python3.9/multiprocessing/pool.py", line 125, in worker
File "/home/pavlop/python/docking-scripts/moldock/vina_dock.py", line 93, in func
OSError: [Errno 24] Too many open files: '/var/tmp/pbs.147815.login/tmpp8tqblfv_input.pdbqt'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/pavlop/python/docking-scripts/moldock/run_dock.py", line 230, in <module>
main()
File "/home/pavlop/python/docking-scripts/moldock/run_dock.py", line 203, in main
for i, (item_id, res) in enumerate(docking(mols,
File "/home/pavlop/python/docking-scripts/moldock/run_dock.py", line 74, in docking
for item_id, res in pool.imap_unordered(partial(func, **kwargs), tuple(items), chunksize=1):
File "/home/pavlop/anaconda3/envs/vina_cache/lib/python3.9/multiprocessing/pool.py", line 870, in next
raise value
OSError: [Errno 24] Too many open files: '/var/tmp/pbs.147815.login/tmpp8tqblfv_input.pdbqt'
What do I do wrong or miss? Why file descriptors are not released? Could it happen that C-extension does not release file descriptors?
I see that temporary files are created and removed as expected. ulimit
(soft and hard) was set to 1000000. I checked all my code and all files are opened using with
statement to avoid leaking.
If I replace multiprocessing.Pool
with dask
cluster, everything works as expected, no errors.
UPDATE:
I checked output of lsof
. Really both temporary files remain open for every item and they are accumulated over time in every running process, but they have status? (deleted). So the issue in that how I manage them. However, since the ulimit
is large, I should not observe this error.
UPDATE2:
It seems that I have to close descriptors manually. It worked on a test run, have to check on a larger run.
fd, name - tempfile.mkstemp()
try:
...
finally:
os.close(fd)
os.unlink(name)