Why does a pipe to another process need to be closed plus set_close_on_exec to really close?

Question

So, I was trying to use OCaml to communicate with a Python process. I wanted to pipe the Python program to the Python interpreter's stdin, and then read the Python program's output back in the OCaml process.

I was able to solve it like this:

let py_program = {|
import time

while True:
    print('hi from Python', flush=True)
    time.sleep(0.25)
|}

let exec_py_program () =
  let cmd = "", [|"python3"; "-"|] in
  let pipe_out_fd, pipe_out_fd_unix = Lwt_unix.pipe_out () in

  (* Close the 1st time *)
  let () = Lwt_unix.set_close_on_exec pipe_out_fd_unix in

  let redir = `FD_move pipe_out_fd in

  let py_stream = Lwt_process.pread_lines ~stdin:redir cmd in

  let%lwt n = Lwt_unix.write_string pipe_out_fd_unix py_program 0 (String.length py_program) in
  if n < String.length py_program then failwith "Failed to write python to pipe" else

    let rec read_back () =
      match%lwt Lwt_stream.get py_stream with
      | Some str ->
        let%lwt () = Lwt_io.printl @@ "Got: " ^ str in
        read_back ()
      | None -> Lwt.return ()
    in

    (* Close the 2nd time *)
    let%lwt () = Lwt_unix.close pipe_out_fd_unix in

    read_back ()

I use "set_close_on_exec" to close the file descriptor corresponding to the pipe mapped to the Python process's stdin near the comment "Close the 1st time", and close the pipe again after sending over the Python program again ("Close the 2nd time"). "set_close_on_exec" supposedly closes the file descriptor "when the process calls exec on another process".

If I leave either of these lines out, the Python process indefinitely keeps reading from its stdin and never begins executing, so "hi from Python" is never received. So my question is, why are these both necessary? It was mostly a guess on my part.

score 2 · Accepted Answer · answered Oct 09 '18 at 21:28

Starting a program on a POSIX operating system (like Linux) is done in two steps. First, the process launching the program is forked, which creates a copy of the running process. Then, the new process is replaced by the new program using a call to exec. When the process is forked both resulting processes inherit all open file descriptors. Hence, to actually close a file descriptor it must be closed in both processes.

Setting the close-on-exec flag, causes the process to close the corresponding file descriptor as soon as exec is called. Hence, when you set this flag, only the old process has the open file descriptor after the program was started.

Why does a pipe to another process need to be closed plus set_close_on_exec to really close?

1 Answers1