I have a complex python script. Inside a loop I call a function with multiprocessing and inside that function I call an external program (pdfinfo) with subprocess popen.
My program runs for a while I can see the VIRT memory steadily increasing (with the top command) until after sometime the system runs out of memory and shows this message:
Traceback (most recent call last):
File "classify_pdf.py", line 603, in <module>
preprocessing_list[loop] = da.get_preprocessing_data(batch_files, metadata, cores)
File "/home/student/.../src/data.py", line 87, in get_preprocessing_data
properties = fp.pre_extract_pdf_properties(batch_files, cores)
File "/home/student/.../src/features/pdf_properties.py", line 73, in pre_extract_pdf_properties
pool = Pool(num_cores)
File "/usr/lib/python3.5/multiprocessing/context.py", line 118, in Pool
context=self.get_context())
File "/usr/lib/python3.5/multiprocessing/pool.py", line 168, in __init__
self._repopulate_pool()
File "/usr/lib/python3.5/multiprocessing/pool.py", line 233, in _repopulate_pool
w.start()
File "/usr/lib/python3.5/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/usr/lib/python3.5/multiprocessing/context.py", line 267, in _Popen
return Popen(process_obj)
File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 20, in __init__
self._launch(process_obj)
File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 67, in _launch
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
After interrupting the process with Crtl-C there are still many of the python processes still running like this (I show them with ps aux | grep ptyhon). Thousands even and they even remain when I close the session with the server and log back in.
user1+ 53872 0.0 0.0 5444552 0 ? S Aug29 0:00 python classify_pdf.py -fp /data/allfiles/ -repo
user1+ 53873 0.0 0.0 5444552 0 ? S Aug29 0:00 python classify_pdf.py -fp /data/allfiles/ -repo
user1+ 53876 0.0 0.0 5444552 0 ? S Aug29 0:00 python classify_pdf.py -fp /data/allfiles/ -repo
But how come there are still so many processes still alive even after I interrupt my script? Does it have something to do with using multiprocessing and a subprocess inside a loop? Is the fork for popen creating additional processes? but why won't they end?
BTW, the part of the code where this happens is
pool = Pool(num_cores)
res = pool.map(pdfinfo_get_pdf_properties, files)
pool.close()
pool.join()
res_fix={}
for x in res:
res_fix[splitext(basename(x[1]))[0]] = x[0]
return res_fix
and inside pdfinfo_get_pdf_properties this is called
output = subprocess.Popen(["pdfinfo", file_path],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE).communicate()[0].decode(errors='ignore')