I have been trying to execute piped commands via the subprocess
module, but am having some issues.
I have seen the solutions proposed below, but none have solved my problem:
- sending a sequence (list) of arguments
- several Popen
commands using subprocess.PIPE
- sending a string with shell=True
I would like to avoid the third option, with shell=True
, although it did produce the expected results on my test system.
Here is the command that works in Terminal, which I would like to replicate:
tr -c "[:alpha:]" " " < some\ file\ name_raw.txt | sed -E "s/ +/ /g" | tr "[:upper:]" "[:lower:]" > clean_in_one_command.txt
This command cleans files as required. It first uses the tr
command on an input file, which has spaces in the name. The output is passed to sed
, which removes some whitespace and then passes the contents to tr
again to make everything lower case.
After several iterations, I ended up breaking it all down into the simplest form I could, implementing the second method above: several instances of Popen
, passing information using subprocess.PIPE
. It is long-winded, but will hopefully make debugging easier:
from subprocess import run, Popen, PIPE
cmd1_func = ['tr']
cmd1_flags = ['-c']
cmd1_arg1 = [r'"[:alpha:]\"']
cmd1_arg2 = [r'" "']
cmd1_pass_input = ['<']
cmd1_infile = ['some file name_raw.txt']
cmd1 = cmd1_func + cmd1_flags + cmd1_arg1 + cmd1_arg2 + cmd1_pass_input + cmd1_infile
print("Command 1:", cmd1) # just to see if things look fine
cmd2_func = ['sed']
cmd2_flags = ['-E']
cmd2_arg = [r'"s/ +/ /g\"']
cmd2 = cmd2_func + cmd2_flags + cmd2_arg
print("command 2:", cmd2)
cmd3_func = ['tr']
cmd3_arg1 = ["\"[:upper:]\""]
cmd3_arg2 = ["\"[:lower:]\""]
cmd3_pass_output = ['>']
cmd3_outfile = [output_file_abs]
cmd3 = cmd3_func + cmd3_arg1 + cmd3_arg2 + cmd3_pass_output + cmd3_outfile
print("command 3:", cmd3)
# run first command into first process
proc1, _ = Popen(cmd1, stdout=PIPE)
# pass its output as input to second process
proc2, _ = Popen(cmd2, stdin=proc1.stdout, stdout=PIPE)
# close first process
proc1.stdout.close()
# output of second process into third process
proc3, _ = Popen(cmd3, stdin=proc2.stdout, stdout=PIPE)
# close second process output
proc2.stdout.close()
# save any output from final process to a logger
output = proc3.communicate()[0]
I would then simply write the output to a text file, but the program doesn't get that far, because I receive the following error:
usage: tr [-Ccsu] string1 string2
tr [-Ccu] -d string1
tr [-Ccu] -s string1
tr [-Ccu] -ds string1 string2
sed: 1: ""s/ +/ /g\"": invalid command code "
usage: tr [-Ccsu] string1 string2
tr [-Ccu] -d string1
tr [-Ccu] -s string1
tr [-Ccu] -ds string1 string2
this suggests that my arguments are not being passed correctly. It seems the '
and "
quote marks are both being passed into sed
as "
. I do actually need one of them there explicitly. If I only put one set into my list, then they are stripped in the command completely, which also breaks the command.
Things I have tried:
- not declaring literal strings for those strings where I need explicit quotations
- escaping and double-escaping explicit quotations
- passing the entire command as one list into the
subprocess.Popen
andsubprocess.run
functions. - playing around with the
shlex
package to deal with quotations - removing the parts
cmd3_pass_output = ['>']
andcmd3_outfile= [output_file_abs]
so that only the raw (piped) output is dealt with.
Am I missing something, or am I going to be forced to use shell=True
?