-1

I have a list of strings in python and want to run a recursive grep on each string in the list. Am using the following code,

import subprocess as sp 
for python_file in python_files:
    out = sp.getoutput("grep -r python_file . | wc -l")
    print(out)

The output I am getting is the grep of the string "python_file". What mistake am I committing and what should I do to correct this??

2 Answers2

1

Your code has several issues. The immediate answer to what you seem to be asking was given in a comment, but there are more things to fix here.

If you want to pass in a variable instead of a static string, you have to use some sort of string interpolation.

grep already knows how to report how many lines matched; use grep -c. Or just ask Python to count the number of output lines. Trimming off the pipe to wc -l allows you to also avoid invoking a shell, which is a good thing; see also Actual meaning of shell=True in subprocess.

grep already knows how to search for multiple expressions. Try passing in the whole list as an input file with grep -f -.

import subprocess as sp
out = sp.check_output(
    ["grep", "-r", "-f", "-", "."],
    input="\n".join(python_files), text=True)
print(len(out.splitlines()))

If you want to speed up your processing and the patterns are all static strings, try also adding the -F option to grep.

Of course, all of this is relatively easy to do natively in Python, too. You should easily be able to find examples with os.walk().

tripleee
  • 175,061
  • 34
  • 275
  • 318
0

Your intent isn't totally clear from the way you've written your question, but the first argument to grep is the pattern (python_file in your example), and the second is the file(s) . in your example

You could write this in native Python or just use grep directly, which is probably easier than using both!

grep args

  • --count will report just the number of matching lines
  • --file Read one or more newline separated patterns from file. (manpage)
grep --count --file patterns.txt -r .
import re
from pathlib import Path

for pattern in patterns:
    count = 0
    for path_file in Path(".").iterdir():
        with open(path_file) as fh:
            for line in fh:
                if re.match(pattern, line):
                   count += 1
    print(count)

NOTE that the behavior in your question would get a separate word count for each pattern, while you may really want a single count

ti7
  • 16,375
  • 6
  • 40
  • 68
  • `glob(".")` will only traverse the current directory, whereas `grep -r` also examines all subdirectories recursively. – tripleee Aug 03 '21 at 06:05
  • @tripleee ah, it was a rush job after you mashed go on your grep-only answer https://meta.stackexchange.com/questions/9731/fastest-gun-in-the-west-problem – ti7 Aug 03 '21 at 06:07
  • It's not hard to fix, but it's also not hard to find existing questions about how to do this properly, and the requirements in the question are rather unclear, so I didn't go this route. – tripleee Aug 03 '21 at 06:08