4

I have been trying to execute piped commands via the subprocess module, but am having some issues.

I have seen the solutions proposed below, but none have solved my problem: - sending a sequence (list) of arguments - several Popen commands using subprocess.PIPE - sending a string with shell=True

I would like to avoid the third option, with shell=True, although it did produce the expected results on my test system.

Here is the command that works in Terminal, which I would like to replicate:

tr -c "[:alpha:]" " " < some\ file\ name_raw.txt | sed -E "s/ +/ /g" | tr "[:upper:]" "[:lower:]" > clean_in_one_command.txt

This command cleans files as required. It first uses the tr command on an input file, which has spaces in the name. The output is passed to sed, which removes some whitespace and then passes the contents to tr again to make everything lower case.

After several iterations, I ended up breaking it all down into the simplest form I could, implementing the second method above: several instances of Popen, passing information using subprocess.PIPE. It is long-winded, but will hopefully make debugging easier:

from subprocess import run, Popen, PIPE

cmd1_func = ['tr']
cmd1_flags = ['-c']
cmd1_arg1 = [r'"[:alpha:]\"']
cmd1_arg2 = [r'" "']
cmd1_pass_input = ['<']
cmd1_infile = ['some file name_raw.txt']
cmd1 = cmd1_func + cmd1_flags + cmd1_arg1 + cmd1_arg2 + cmd1_pass_input + cmd1_infile
print("Command 1:", cmd1)    # just to see if things look fine

cmd2_func = ['sed']
cmd2_flags = ['-E']
cmd2_arg = [r'"s/ +/ /g\"']
cmd2 = cmd2_func + cmd2_flags + cmd2_arg
print("command 2:", cmd2)

cmd3_func = ['tr']
cmd3_arg1 = ["\"[:upper:]\""]
cmd3_arg2 = ["\"[:lower:]\""]
cmd3_pass_output = ['>']
cmd3_outfile = [output_file_abs]
cmd3 = cmd3_func + cmd3_arg1 + cmd3_arg2 + cmd3_pass_output + cmd3_outfile
print("command 3:", cmd3)

# run first command into first process
proc1, _ = Popen(cmd1, stdout=PIPE)
# pass its output as input to second process
proc2, _ = Popen(cmd2, stdin=proc1.stdout, stdout=PIPE)
# close first process
proc1.stdout.close()
# output of second process into third process
proc3, _ = Popen(cmd3, stdin=proc2.stdout, stdout=PIPE)
# close second process output
proc2.stdout.close()
# save any output from final process to a logger
output = proc3.communicate()[0]

I would then simply write the output to a text file, but the program doesn't get that far, because I receive the following error:

usage: tr [-Ccsu] string1 string2
       tr [-Ccu] -d string1
       tr [-Ccu] -s string1
       tr [-Ccu] -ds string1 string2
sed: 1: ""s/ +/ /g\"": invalid command code "
usage: tr [-Ccsu] string1 string2
       tr [-Ccu] -d string1
       tr [-Ccu] -s string1
       tr [-Ccu] -ds string1 string2

this suggests that my arguments are not being passed correctly. It seems the ' and " quote marks are both being passed into sed as ". I do actually need one of them there explicitly. If I only put one set into my list, then they are stripped in the command completely, which also breaks the command.

Things I have tried:

  1. not declaring literal strings for those strings where I need explicit quotations
  2. escaping and double-escaping explicit quotations
  3. passing the entire command as one list into the subprocess.Popen and subprocess.run functions.
  4. playing around with the shlex package to deal with quotations
  5. removing the parts cmd3_pass_output = ['>'] and cmd3_outfile= [output_file_abs] so that only the raw (piped) output is dealt with.

Am I missing something, or am I going to be forced to use shell=True?

Community
  • 1
  • 1
n1k31t4
  • 2,745
  • 2
  • 24
  • 38

1 Answers1

3

This program appears to do what you want. Each of the processes must be run separately. As you build them, the output from one gets piped out to the input of the next. The files are handled independently and used at the beginning and ending of the process.

#! /usr/bin/env python3
import subprocess


def main():
    with open('raw.txt', 'r') as stdin, open('clean.txt', 'w') as stdout:
        step_1 = subprocess.Popen(
            ('tr', '-c', '[:alpha:]', ' '),
            stdin=stdin,
            stdout=subprocess.PIPE
        )
        step_2 = subprocess.Popen(
            ('sed', '-E', 's/ +/ /g'),
            stdin=step_1.stdout,
            stdout=subprocess.PIPE
        )
        step_3 = subprocess.Popen(
            ('tr', '[:upper:]', '[:lower:]'),
            stdin=step_2.stdout,
            stdout=stdout
        )
        step_3.wait()


if __name__ == '__main__':
    main()
Noctis Skytower
  • 21,433
  • 16
  • 79
  • 117
  • Worked like a charm - very clever idea to take the files out of the equation, making it all more pythonic. If there are any drawbacks to this approach, I haven't noticed them. Does the `step_3.wait()` simply allow the file to be written before closing the process? – n1k31t4 Apr 01 '17 at 15:48
  • 1
    Yes, the call to `wait` is there to let `tr` finish what it is doing before the files are closed. – Noctis Skytower Apr 01 '17 at 15:57