0

I need to execute this line in my python script:

p = subprocess.Popen(["python ~/mrjobScript.py ~/jobs/"+date+"/input/* > ~/jobs/"+date+"/output/output-"+domain+".log","--domain "+domain],stdout=subprocess.PIPE, shell=True)

However the --domain switch should come right after the mrjobScript.py. How to achieve?

Stephan Kristyn
  • 15,015
  • 14
  • 88
  • 147
  • 1
    don't put the command in a list using shell=True, also the `>` won't work just redirect the output to a file object – Padraic Cunningham Jan 26 '15 at 09:58
  • Since you are trying to call Python with Python, it may make more sense for you to simply import the target module and invoke its functions directly. Then you can pass command line arguments much more naturally than going through subprocess and the OS. – merlin2011 Jan 26 '15 at 10:06
  • @merlin2011 that is not an option with mapreduce (mrjob), or at least I don't know how, because there are jobs spawned, etc.. – Stephan Kristyn Jan 26 '15 at 10:08
  • @Padraic Cunningham `>` is mrjob syntax and does work on the CLI. – Stephan Kristyn Jan 26 '15 at 10:09
  • @SirBenBenji, what is `>` supposed to do? Your command is not correct anyway, you pass a string with shell=True, you want a list of individual args and to not use shell=True if you pass a list, if `--domain` should come right after the `mrjobScript.py` then is should be right after – Padraic Cunningham Jan 26 '15 at 10:11
  • If you show a simplified version of mrjobscript.py which reproduces your problem it will be easier to help you. – merlin2011 Jan 26 '15 at 10:32
  • I will on code review and post a link here in the next few days. – Stephan Kristyn Jan 26 '15 at 11:07

2 Answers2

2

You don't use a list with shell=True, you should also redirect the output to a file to save it to a file.

import os
from subprocess import check_call
pt1 = os.path.expanduser("~/mrjobScript.py")
pt2 = os.path.expanduser("~/jobs/{}/input/*".format(date)

with open (os.path.expanduser("~/jobs/{}/output/output-{}.log".format(date,domain)),"w") as f:
     subprocess.check_call(["python",pt1,"--domain",domain,pt2],stdout=f)
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
  • why you format the date? this aint necessary imho. my mrjobsscript takes the date just as I hand it over. – Stephan Kristyn Jan 26 '15 at 10:14
  • 1
    @SirBenBenji, what do you mean by format? I am simply making the code cleaner by using str.format to join instead of + – Padraic Cunningham Jan 26 '15 at 10:15
  • `p = subprocess.check_output(["python","--domain",domainNoTld,pt1,pt2],stdout=f) File "/usr/lib/python2.7/subprocess.py", line 565, in check_output raise ValueError('stdout argument not allowed, it will be overridden.') ValueError: stdout argument not allowed, it will be overridden.` – Stephan Kristyn Jan 26 '15 at 10:22
  • No the written data is about 1000-2000 lines. So far your syntax seems to be solid now (except `- {}` where is a space too much). However I am getting a `returned non-zero exit status 2`. I suspect this is because of the missing `>`, see https://pythonhosted.org/mrjob/guides/quickstart.html#running-your-job-different-ways – Stephan Kristyn Jan 26 '15 at 10:37
  • --domain is a manually scripted mrjob switch, which works fine. – Stephan Kristyn Jan 26 '15 at 10:49
  • You have been very helpful in understanding this issue. – Stephan Kristyn Jan 26 '15 at 11:06
1

Assuming that your command works on the command line, then deleting the argument stdout, changing the order of the arguments, and passing a string instead of a list should be sufficient to make it reproduce the behavior on the command line.

import subprocess
subprocess.Popen("python ~/mrjobScript.py --domain {1} ~/jobs/{0}/input/* > ~/jobs/{0}/output/output-{1}.log ".format(date,domain), shell=True)

Note that this will start the subprocess and then go to the next line of your code. Your code will not wait for it. If you want your code to wait for it, you may want to use subprocess.call instead.

Note of Warning: It is recommended that the user of shell=True consult this answer to fully understand the implications of such usage. In particular, such applications should never let user-supplied arguments get passed directly or indirectly to the argument of a call to subprocess.Popen with shell=True without sanitization.

Community
  • 1
  • 1
merlin2011
  • 71,677
  • 44
  • 195
  • 329
  • using shell=True is usually a very bad idea – Padraic Cunningham Jan 26 '15 at 11:10
  • @PadraicCunningham, While that is the conventional wisdom, I have three comments. 1) The usual argument is security, and we see no obvious evidence that the OP is passing user-provided arguments into his own script. 2) If it was _always_ a bad idea, it would have been removed Python 3, and it was not. 3) Pragmatism. Developer time is precious, and bending over backwards to simulate the shell's functionality inside your own program is not pragmatic. – merlin2011 Jan 26 '15 at 11:21
  • The OP obviously has very little understanding of subprocess which is pretty evident from their attempt, while there is no risk now there may well be later and you did not once mention in your answer anything about the risks of using shell=True in your answer. Also three extra lines is hardly going to kill any developer. – Padraic Cunningham Jan 26 '15 at 11:25
  • @PadraicCunningham, I have added a disclaimer. Please feel free to edit the wording on the disclaimer if you think it's not clear enough. – merlin2011 Jan 26 '15 at 11:31
  • looks fine and I do think it is worth mentioning if you are going to use shell=True – Padraic Cunningham Jan 26 '15 at 11:38