0

I have seen a lot of solutions, but I am not seeing one that works I am trying to grep every file in a directory in Python for a specific string, count the number of lines that the grep returns, and record this in python. Here's what I have tried most recently:

for f in try_files:
    print("trying %s"%f)
    s = subprocess.Popen("grep -r '%s' ../dir/*"%f)
    print(s)

I am getting this error:

trying accept_button_off_transparent.png
Traceback (most recent call last):
  File "findImages.py", line 17, in <module>
    s = subprocess.Popen("grep -r %s '../dir/*'"%f)
  File "/Users/agsrn/anaconda3/lib/python3.5/subprocess.py", line 950, in __init__
    restore_signals, start_new_session)
  File "/Users/agsrn/anaconda3/lib/python3.5/subprocess.py", line 1544, in _execute_child
    raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: "grep -r accept_button_off_transparent.png '../dir/*'"
Agsrn-MacBook-Pro:images agsrn$ emacs findImages.py
Agsrn-MacBook-Pro:images agsrn$ python findImages.py
['accept_button_off_transparent.png', 'accept_button_on.png', 'accept_button_on_food.png', 'accept_button_on_transparent.png']
trying accept_button_off_transparent.png
Traceback (most recent call last):
  File "findImages.py", line 17, in <module>
    s = subprocess.Popen("grep -r '%s' ../dir/*"%f)
  File "/Users/agsrn/anaconda3/lib/python3.5/subprocess.py", line 950, in __init__
    restore_signals, start_new_session)
  File "/Users/agsrn/anaconda3/lib/python3.5/subprocess.py", line 1544, in _execute_child
    raise child_exception_type(errno_num, err_msg)

Ultimately I want to execute this query from within Python:

grep -r "filename" ../dir/* | wc -l

...And get that line count back as a # I can use for other logic. What's the best way to do this?

To be clear, my ultimate goal is to count how many times a particular string is mentioned by any/all files in a directory for a list of a bunch of strings. I am looking for strings inside files, not just file names. I suspect grep is a much faster solution to do this than Python, but it's inside a larger Python routine, hence the proposed hybrid solution.

helloB
  • 3,472
  • 10
  • 40
  • 87

4 Answers4

0

Probably because of this, from the docs: "If args is a string, the interpretation is platform-dependent [...]. On POSIX, if args is a string, the string is interpreted as the name or path of the program to execute."

The error you see says that your string is interpreted as a file name, so it fits this description. Try instead to pass args as a list:

subprocess.Popen(["grep", "-r", f, "../dir/*"], shell=True)
JulienD
  • 7,102
  • 9
  • 50
  • 84
  • Thanks for this suggestion. Unfortunately, this solution gets me back to another problem I have. When I run this, no error, but I get the following: grep: ../dir/*: No such file or directory even though there IS such a file/directory. – helloB Apr 22 '16 at 21:46
  • Have you tried to print `glob.glob('../dir')`? So that we are sure. Also in one version in your question you search for "../dir*", while the other is "../dir/*" (with slash). – JulienD Apr 22 '16 at 21:51
  • @helloB Ok you need that: http://stackoverflow.com/questions/9997048/python-subprocess-wildcard-usage. I edit my answer with it. Note the warning though: https://docs.python.org/3/library/subprocess.html#security-considerations – JulienD Apr 22 '16 at 22:03
0

If you accept another solution, here it is. Counting files can be performed easily with glob:

import glob
files = glob.glob("filename")
nfiles = len(files) 

In which "filename" has the patter you want. Then, you can use nfiles for your logic.

Alejandro
  • 3,263
  • 2
  • 22
  • 38
  • I am not trying to count files. I am trying to grep all files in a directory for a particular string, and I am planning to do this for a lot of strings to identify strings that are not used inside any file inside the directory from a list of strings. – helloB Apr 22 '16 at 21:45
0

Alternatively to my other answer, you may want to try and do it entirely in python this way:

import re   # regex module

for filename in files:
    n = 0
    for line in open(filename, 'r'):
        if re.match(r"...", line):
            n += 1
JulienD
  • 7,102
  • 9
  • 50
  • 84
0

The following shell command will output the count that you want:

find ../dir -type f -exec cat {} + | grep -c 'filename'

The find command will print the contents of all the files in the directory, and the -c option to grep tells it to print the count of matches instead of the matching lines.

You can run this command with subprocess.Popen(). You need to use the shell=True option so it processes this as a shell command, not the name of a program to run. And to get the output of the command, you need to specify stdout=PIPE and use communicate to read from it.

pipe = subprocess.Popen("find ../dir -type f -exec cat {} + | grep -c '%s'"%f, shell=True, stdout=PIPE)
count = int(pipe.communicate()[0]);

See Store output of subprocess.Popen call in a string

Community
  • 1
  • 1
Barmar
  • 741,623
  • 53
  • 500
  • 612