0

I am going to use subprocess to handle commands in linux. But I don't know when does this p1.stdout will close. As This may influlence the data integrity which should be written to test file.

p1 = subprocess.Popen('cat ', stdin=subprocess.PIPE, stdout=subprocess.PIPE, shell=True, universal_newlines=True)
p2 = subprocess.Popen('cat ', stdin=p1.stdout, stdout=open("test", 'w'), shell=True, universal_newlines=True)
p1.stdin.write('asda'*100)
p1.stdin.close()

I know I can use p1.communicate() after I write all data to p1.stdin,But this function will close p1.stdout too and I am certain if there is a chance that test will lose a part of data.

BAKE ZQ
  • 765
  • 6
  • 22
  • It's only after the running `p1` closes *its own* copy of the stdout handle that `p2` will see an EOF on its stdin, if you've successfully closed all other copies (held by the parent process or similar). If you *haven't* done that, you'll see `p2` hang because it never sees that EOF. – Charles Duffy May 17 '19 at 14:52
  • That said, `shell=True` is a really bad idea -- here and in general. Take it out, and just make it `['cat']` with no trailing space. – Charles Duffy May 17 '19 at 14:53
  • ..."which is a mediator?", though -- there is no one mediator as such; they each decide when to exit independently. However, `p2` will make that independent decision if it happens to get an EOF on its stdin. – Charles Duffy May 17 '19 at 14:54
  • @CharlesDuffy If I close p1.stdout. Then how could p1 read from it. – BAKE ZQ May 17 '19 at 14:59
  • Close it **in the parent process**; that still leaves it open in the already-launched children. Each process has its own independent file-descriptor table. – Charles Duffy May 17 '19 at 15:04
  • See the example in https://docs.python.org/3/library/subprocess.html#replacing-shell-pipeline – Charles Duffy May 17 '19 at 15:05
  • If you mean p2 is p1's children? – BAKE ZQ May 17 '19 at 15:05
  • No, both `p1` and `p2` are children of your Python interpreter; they're siblings of each other. – Charles Duffy May 17 '19 at 15:05
  • So p1 and p2 both have opened p1.stdout. And they have two independent file-descriptor related to this p1.stdout? – BAKE ZQ May 17 '19 at 15:06
  • The issue that makes closing FDs necessary is that if there's more than one handle on the input side of a FIFO, the output side will only return EOF when **all** handles -- the copies in all the different processes -- have closed. – Charles Duffy May 17 '19 at 15:07
  • What do you mean, "this stdout"? p1 and p2 have two different stdouts. – Charles Duffy May 17 '19 at 15:07
  • ...however, `p1.stdin`, `p2.stdin`, `p1.stdout`, and `p2.stdout` are all independent file handles **in your Python interpreter**; the `.stdin` ones attach to the write side of the FIFOs (while the read ones are held by the program that was run), and the `.stdout` ones attach to the read sides of the FIFOs (while the write ones are held by the spawned off programs). – Charles Duffy May 17 '19 at 15:08
  • After you start `p2` and pass `stdin=p1.stdout`, there are now *two* copies of `p1.stdout`, the other one held by Python. But `p2` will never see an EOF until after the Python interpreter closes its copy. – Charles Duffy May 17 '19 at 15:09
  • ...which is to say: The advice in the linked documentation and in every other Stack Overflow question on this subject (of which there are many) is correct, and will not cause you data loss. – Charles Duffy May 17 '19 at 15:09
  • Use `p1.stdin.close()` or `p1.communicate()` would be fine? – BAKE ZQ May 17 '19 at 15:11
  • See in particular the answer to the linked question at https://stackoverflow.com/a/53688246/14122 – Charles Duffy May 17 '19 at 15:11
  • Not "or", *and*. You need to close the Python interpreter's copy of `p1.stdout` after you start `p2`. *After* you do that, you can run `p2.communicate()`. – Charles Duffy May 17 '19 at 15:12
  • Which one is the Python interpreter's copy of p1.stdin – BAKE ZQ May 17 '19 at 15:13
  • The only one you can refer to from Python? That is, when you run `p1.stdout` in Python, it refers to the Python interpreter's copy of the write handle to p1.stdout, *because Python doesn't have access to the other copy, it's held by `p2` as entry 0 in its FD table*. – Charles Duffy May 17 '19 at 15:13
  • Yes, but you asked which was the interpreter's copy of p1.stdin, so I answered that. :) – Charles Duffy May 17 '19 at 15:14
  • So I use p1.stdin.close() is not enough .but why you suggest that I should use p2.communicate().According to your explanation.i should use p1.communicate() – BAKE ZQ May 17 '19 at 15:16
  • `p2.communicate()` waits for `p2` to exit. That's the normal way shells work when you run a pipeline -- they wait for the *end* of the pipeline to exit, and return its exit status, even though input (and thus, indirectly, control over how much data is processed before exit) streams from the pipeline's beginning. – Charles Duffy May 17 '19 at 15:17
  • After I write all data to p1.stdin.I should first call p1.communicate() then p2.communicate()? – BAKE ZQ May 17 '19 at 15:19
  • I tried your suggestion. when I just call p2.communiate(). It hangs. – BAKE ZQ May 17 '19 at 15:23
  • Are you closing `p1.stdin`? If not, `p1` will hang waiting for more input. – Charles Duffy May 17 '19 at 15:25
  • I guess I figured out my confuse. EOF will be written to p1.stdout after subprocess p1 close the write end of p1.stdout after cat read EOF from p1.stdin. Am I right? And I tested a bit to prove what you said about this copy thing. Closing p1.stdout(read end) in python after the initiation of p2(Popen) at any time actually cause no influence to this whole process. – BAKE ZQ May 18 '19 at 02:59
  • And to make a pipeline. I think what I need to do is close the fisrt subprocees's stdin and wait for the last subprocess to termincate. – BAKE ZQ May 18 '19 at 03:02

0 Answers0