0

I wonder why subprocesses keep so many files open. I have an example in which some files seem to remain open forever (after the subprocess finishes and even after the program crashes).

Consider the following code:

import aiofiles
import tempfile

async def main():
    return [await fds_test(i) for i in range(2000)]

async def fds_test(index):
    print(f"Writing {index}")
    handle, temp_filename = tempfile.mkstemp(suffix='.dat', text=True)
    async with aiofiles.open(temp_filename, mode='w') as fp:
        await fp.write('stuff')
        await fp.write('other stuff')
        await fp.write('EOF\n')

    print(f"Reading {index}")
    bash_cmd = 'cat {}'.format(temp_filename)
    process = await asyncio.create_subprocess_exec(*bash_cmd.split(), stdout=asyncio.subprocess.DEVNULL, close_fds=True)
    await process.wait()
    print(f"Process terminated {index}")

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

This spawns processes one after the other (sequentially). I expect the number of files simultaneously opened by this to also be one. But it's not the case and at some point I get the following error:

/Users/cglacet/.pyenv/versions/3.8.0/lib/python3.8/subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, start_new_session)
   1410             # Data format: "exception name:hex errno:description"
   1411             # Pickle is not used; it is complex and involves memory allocation.
-> 1412             errpipe_read, errpipe_write = os.pipe()
   1413             # errpipe_write must not be in the standard io 0, 1, or 2 fd range.
   1414             low_fds_to_close = []

OSError: [Errno 24] Too many open files

I tried running the same code without the option stdout=asyncio.subprocess.DEVNULL but it still crashes. This answer suggested it might be where the problem comes from and the error also points at the line errpipe_read, errpipe_write = os.pipe(). But it doesn't seem like this is the problem (running without that option gives the same error).

In case you need more information, here is an overview from the output of lsof | grep python:

python3.8 19529 cglacet    7u      REG                1,5        138 12918796819 /private/var/folders/sn/_pq5fxn96kj3m135j_b76sb80000gp/T/tmpuxu_o4mf.dat
# ... 
# ~ 2000 entries later : 
python3.8 19529 cglacet 2002u      REG                1,5        848 12918802386 /private/var/folders/sn/_pq5fxn96kj3m135j_b76sb80000gp/T/tmpcaakgz3f.dat

These are the temporary files that my subprocesses are reading. The rest of the output from lsof seems like legit stuff (libraries opened, like pandas/numpy/scipy/etc.).

Now I have some doubt: maybe the problem comes from aiofiles asynchronous context manager? Maybe it's the one not closing the files and not create_subprocess_exec?

There is a similar question here, but nobody really try to explain/solve the problem (and only suggest increasing the limit) : Python Subprocess: Too Many Open Files. I would really like to understand what is going on, my first goal is not necessarily to temporarily solve the problem (in the future I want to be able to run function fds_test as many times as needed). My goal is to have a function that behave as expected. I probably have to change either my expectation or my code, that's why I ask this question.


As suggested in the comments here, I also tried to run python -m test test_subprocess -m test_close_fds -v which gives:

== CPython 3.8.0 (default, Nov 28 2019, 20:06:13) [Clang 11.0.0 (clang-1100.0.33.12)]
== macOS-10.14.6-x86_64-i386-64bit little-endian
== cwd: /private/var/folders/sn/_pq5fxn96kj3m135j_b76sb80000gp/T/test_python_52961
== CPU count: 8
== encodings: locale=UTF-8, FS=utf-8
0:00:00 load avg: 5.29 Run tests sequentially
0:00:00 load avg: 5.29 [1/1] test_subprocess
test_close_fds (test.test_subprocess.POSIXProcessTestCase) ... ok
test_close_fds (test.test_subprocess.Win32ProcessTestCase) ... skipped 'Windows specific tests'

----------------------------------------------------------------------

Ran 2 tests in 0.142s

OK (skipped=1)

== Tests result: SUCCESS ==

1 test OK.

Total duration: 224 ms
Tests result: SUCCESS

So it seems files should be correctly closed, I'm a bit lost here.

cglacet
  • 8,873
  • 4
  • 45
  • 60
  • Maybe look at [this thread](https://stackoverflow.com/questions/16526783/python-subprocess-too-many-open-files?noredirect=1&lq=1) – Zionsof Mar 30 '20 at 11:27
  • That's the one I linked in my question, people only suggest ways to get around the problem without really solving it (increasing the limit of files the process can open simultaneously). One answer does suggest to add `close_fds`, which also refers to this : https://www.python.org/dev/peps/pep-0446/#issues-fixed-in-the-subprocess-module – cglacet Mar 30 '20 at 11:33
  • The first question you posted to has a good point of - why are you using PIPE if you're not reading from the process at all? Maybe that could help you with your problem. That, or either try to `communicate()` to read the data (and possibly ignore it) – Zionsof Mar 30 '20 at 11:39
  • I simply used it to remove all outputs from my bash command. But I tried without that option and it still crashes with the exact same error. – cglacet Mar 30 '20 at 12:27
  • @cglacet To just get rid of output, send it to: [`subprocess.DEVNULL`](https://docs.python.org/3/library/subprocess.html#subprocess.DEVNULL)? Generally the fd should be close when you explicitly do so (context manager would do that upon exiting the context) or kernel will do that for you when the process terminates (its descriptor table is ditched). – Ondrej K. Mar 30 '20 at 13:54
  • @OndrejK. you're right, I had no idea `dev/null` option existed (first time I use this mechanism). Thanks. – cglacet Mar 31 '20 at 11:36
  • @cglacet Great that helps. I would still also be a little curious why would the fds be dangling around... but most likely their processes are as well? – Ondrej K. Mar 31 '20 at 11:56
  • @OndrejK. no, processes are all terminated. In `lsof` the dangling files are attached to the main python process. I'll try to implement a minimal working example, by now someone would have noticed a big/obvious mistake in my code. – cglacet Mar 31 '20 at 15:00
  • @OndrejK. That was surprisingly easy, the most simple code crashed right away. I updated my answer with this example. – cglacet Mar 31 '20 at 15:14
  • @OndrejK. oh, I found something, if I replace the temporary file by `temp_filename = f"test_{index}.dat"`, the code doesn't crash anymore. As I suspected the problem doesn't come from subprocess but from somewhere else. On the other hand I would never have suspected this part of the code. I'll investigate some more. – cglacet Mar 31 '20 at 15:21

1 Answers1

1

The problem doesn't come from create_subprocess_exec the problem in this code is that tempfile.mkstemp() actually opens the file:

mkstemp() returns a tuple containing an OS-level handle to an open file (as would be returned by os.open()) …

I thought it would only create the file. To solve my problem I simply added a call to os.close(handle). Which removes the error but is a bit weird (opens a file twice). So I rewrote it as:

import aiofiles
import tempfile
import uuid


async def main():
    await asyncio.gather(*[fds_test(i) for i in range(10)])

async def fds_test(index):
    dir_name = tempfile.gettempdir()
    file_id = f"{tempfile.gettempprefix()}{uuid.uuid4()}"
    temp_filename = f"{dir_name}/{file_id}.dat"

    async with aiofiles.open(temp_filename, mode='w') as fp:
        await fp.write('stuff')

    bash_cmd = 'cat {}'.format(temp_filename)
    process = await asyncio.create_subprocess_exec(*bash_cmd.split(), close_fds=True)
    await process.wait()


if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Now I wonder why the error was raised by subprocess and not tempfile.mkstemp, maybe because it subprocess opens so much more files that it makes it unlikely that the temporary file creation is what breaks the limit …

cglacet
  • 8,873
  • 4
  • 45
  • 60
  • "why the error was raised by subprocess" -- the place where a resource (opened file) is leaked is not necessarily the same place where the resource is finally exhausted. To understand it more clearly: you can consume a lot of memory in one place and get MemoryError in a completely different place in an ordinary code. – jfs Mar 31 '20 at 16:55
  • I understand that it can happen, but my guts tell me it shouldn't always happen. If you consume 99 unit of a resource in one place and one unit in a second place I feel like most of the time (statistically speaking) the limit should be reached where the 99% of the resources are consumed (not sure my point is clear). In that case it's more like 50-50 (is it?) so I thought it should sometimes crash on process creation but sometimes it should also crash on temporary file creation. Anyway, I'm glad I finally found what the problem was, thanks for your help. – cglacet Apr 01 '20 at 09:12
  • the logic in your comment works for finding a performance bottleneck in a loop: if some calls take 99% time of the cycle then if you stop the iterations at random, you are likely to land in these expensive calls. It is less applicable if you are dealing with a finite resource that you can exhaust only once: imagine your program is AB where A consume 9 tokens, B consume 2 tokens and you are given 10 tokens total: your program will exhaust the token in the B part every time despite B consuming several times less. – jfs Apr 01 '20 at 16:35
  • I see your point, the execution is not random (for example here its: creating a file, creating a subprocess, repeat), so if it fails at some fixed number of files created it will always be at the exact same place. I should try to add some randomness (don't run the shell command with a fixed probability). This way I could manage to make it crash on either one of the two parts. – cglacet Apr 01 '20 at 17:27