0

Imagine that file.txt contains the following:

line one
line two
line three

Then, these calls to subprocess.check_output fail (python 2.7.5 says that grep fails with exit code 1, in python 3.8.5 it hangs & requires a keyboard interrupt to stop the program):

# first approach
command = 'grep "one\|three" ./file.txt'
results = subprocess.check_output(command.split())
print(results)

# second approach
command = 'grep -E "one|three" ./file.txt'
results = subprocess.check_output(command.split())
print(results)

but this call succeeds (on both versions) and gives the expected output:

#third approach
command = 'grep -e one -e three ./file.txt'
results = subprocess.check_output(command.split())
print(results)

Why is this the case? My only guess as to why approaches one and two don't work is some intricacy between how the subprocess module and the | character work, but I honestly have no idea why this would cause the call to fail; in the first approach, the character is escaped, and in the second approach, we have a flag being passed to grep saying that we shouldn't have to escape the character. Additionally, approaches 1 and 2 work as expected if you just enter them in on the command line as normal. Could it be that the subprocess module is interpreting the character as a pipe instead of a regex OR?

Pacopenguin
  • 194
  • 1
  • 9
  • Does this answer your question? [How to use \`subprocess\` command with pipes](https://stackoverflow.com/questions/13332268/how-to-use-subprocess-command-with-pipes) – Ari Cooper-Davis Mar 21 '22 at 16:46
  • @AriCooper-Davis Not particularly. What I'm asking about isn't really related to piping output from one process together -- I'd like to know why regex OR doesn't appear to work in this usage of grep. Sure I could sidestep this problem by creating a long chain of pipes so that I only have to grep for one string at a time, but I'd much rather take approach 3 mentioned above over that. – Pacopenguin Mar 21 '22 at 16:50
  • Why are you using a subprocess here at all? `with open("file.txt") as lines: result = [line for line in lines if "one" in line or "three" in line]` – tripleee Mar 21 '22 at 16:56
  • @tripleee The file that I'm trying to parse from is rather large & after having done exactly that approach that you mentioned, I'm trying to speed up execution time by offloading the pattern matching to grep (which does this kind of thing mighty quick), and then just perform my post processing on whatever it returns. – Pacopenguin Mar 21 '22 at 17:00

1 Answers1

1

The result of command.split() contains quotes which should no longer be there. That's why Python provides shlex.split, but it's also not hard to understand how to split the command manually, though obviously you need to understand the role of the quotes in the shell, and how basically you need to remove them when there is no shell.

command = 'grep "one\|three" ./file.txt'
results1 = subprocess.check_output(['grep', r'one\|three', './file.txt'])
results2 = subprocess.check_output(shlex.split(command))
results3 = subprocess.check_output(command, shell=True) # better avoid

Quotes tell the shell to not perform whitespace tokenization and/or wildcard expansion on a value, but when there is no shell, you should simply provide a string instead where the shell allowed or even required you to use a quoted string.

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • I was originally using quotes there because in my actual use case, I was grepping for a string that had a space in it, but in light of this answer & some more testing, it seems like they're not needed. I'll look into quote usage & the shell, but do you have any resources you could recommend ? – Pacopenguin Mar 21 '22 at 17:25
  • 1
    Several of my existing answers here in both the Python subprocess tag and the shell programming tags are related to this. Maybe start with [When to wrap quotes around a shell variable?](https://stackoverflow.com/questions/10067266/when-to-wrap-quotes-around-a-shell-variable) – tripleee Mar 21 '22 at 17:30