90

I need a way of searching a file using grep via a regular expression from the Unix command line. For example when I type in the command line:

python pythonfile.py 'RE' 'file-to-be-searched'

I need the regular expression 'RE' to be searched in the file and print out the matching lines.

Here's the code I have:

import re
import sys

search_term = sys.argv[1]
f = sys.argv[2]

for line in open(f, 'r'):
    if re.search(search_term, line):
        print line,
        if line == None:
            print 'no matches found'

But when I enter a word which isn't present, no matches found doesn't print

wjandrea
  • 28,235
  • 9
  • 60
  • 81
David
  • 901
  • 1
  • 6
  • 4
  • 1
    If you really want for python-style regular expressions in grep, the --perl-regex option to grep is really close. It gives you perl-style regular expression support. (Also, my favorite underused option to grep is --color=always ) – Ross Rogers Dec 17 '09 at 13:56

8 Answers8

91

The natural question is why not just use grep?! But assuming you can't...

import re
import sys

file = open(sys.argv[2], "r")

for line in file:
     if re.search(sys.argv[1], line):
         print line,

Things to note:

  • search instead of match to find anywhere in string
  • comma (,) after print removes carriage return (line will have one)
  • argv includes python file name, so variables need to start at 1

This doesn't handle multiple arguments (like grep does) or expand wildcards (like the Unix shell would). If you wanted this functionality you could get it using the following:

#!/usr/bin/env python3

import re
import sys
import glob

regexp = re.compile(sys.argv[1])
for arg in sys.argv[2:]:
    for fn in glob.iglob(arg):
        with open(fn) as file:
            for line in file:
                if re.search(regexp, line):
                    print(line, end='')
Hans Ginzel
  • 8,192
  • 3
  • 24
  • 22
Nick Fortescue
  • 43,045
  • 26
  • 106
  • 134
  • 9
    you should compile your regex before using the loops. – ghostdog74 Dec 17 '09 at 14:59
  • 6
    This has two down votes and I have no idea why. Anyone who downvoted want to leave a comment? I know you could add regex compilation etc, but I thought that would detract from the clarity of the answer. I don't think there is anything incorrect, and I've run the code, unlike some of the other answers – Nick Fortescue Dec 17 '09 at 15:24
  • This answer was perfect for me thanks. Just another quick question how would i print if no matches were found? – David Dec 17 '09 at 16:16
  • add a counter, and increase it if a match happens. At the end check it in an if and print if no answers found – Nick Fortescue Dec 17 '09 at 16:26
  • ok i put a line counter in which counts the number of lines. But when i execur=te the program nothing is printed. i.e. it wont print 'no matches found' – David Dec 17 '09 at 16:47
  • can you add this problem as another stack overflow question, and show your source code as it looks at the moment? put a reference to this question, and give me a comment with the new question number – Nick Fortescue Dec 18 '09 at 11:21
  • 7
    "you should compile your regex before using the loops.", No, Python will compile and cache it on its own, it's a common myth, it's a nice thing to do for readability reasons, htough. –  Oct 11 '16 at 14:16
  • For Python 3 I found that `print(line,)` wouldn't remove the extra line breaks. However for all the list comprehension junkies out there, this works nicely and removes line breaks `print(''.join([line for line in open(sys.argv[2], 'r') if re.search(sys.argv[1], line)]))` – icc97 Oct 20 '16 at 07:53
  • 6
    The reasonable answer to the natural question is "Because the code is part of a much larger Python script, and who wants to call out to grep in such a case?" In short, I'm glad this question is here because I'm replacing a bash script with a Python script that is hopefully easier on the system. – Mike S Feb 09 '17 at 21:03
  • 1
    The answer to the question for me is "we want to grep in production logs on a Windows machine where they don't have proper grep, only a useless baregrep tool but all our clients will have Python installed as our system uses it". – CashCow Oct 23 '20 at 11:16
  • re.findall to mimic grep better as re.search only returns the first match – gseattle Mar 26 '21 at 09:33
  • @CashCow `grep` is such a vital tool for specific purposes and has been optimised fiendishly over the decades. Cygwin would be the way to go on a crappy old Windoze box. But your client or whoever would have to have installed it, haha. Of course, `grep` has one fatal flaw (per my understanding): it is single-thread ... it will be superseded eventually. – mike rodent May 30 '23 at 13:39
13

Concise and memory efficient:

#!/usr/bin/env python
# file: grep.py
import re, sys, collections

collections.deque(map(sys.stdout.write,(l for l in sys.stdin if re.search(sys.argv[1],l))),maxlen=0)

It works like egrep (without too much error handling), e.g.:

cat input-file | grep.py "RE"

And here is the one-liner:

cat input-file | python -c "import re,sys,collections;collections.deque(map(sys.stdout.write,(l for l in sys.stdin if re.search(sys.argv[1],l))),maxlen=0)" "RE"

Note that the collections.deque function is required in Python3 because map has become a lazy function.

Giancarlo Sportelli
  • 1,219
  • 1
  • 17
  • 20
9

Adapted from a grep in python.

Accepts a list of filenames via [2:], does no exception handling:

#!/usr/bin/env python
import re, sys, os

for f in filter(os.path.isfile, sys.argv[2:]):
    for line in open(f).readlines():
        if re.match(sys.argv[1], line):
            print line

sys.argv[1] resp sys.argv[2:] works, if you run it as an standalone executable, meaning

chmod +x

first

miku
  • 181,842
  • 47
  • 306
  • 310
  • what's the difference between `re.match` and `re.search` ? – OscarRyz Dec 17 '09 at 14:10
  • 2
    @OscarRyz see [Nick Fortescue's top answer](http://stackoverflow.com/a/1921932/327074): "`search` instead of `match` to find anywhere in string" – icc97 Oct 19 '16 at 08:57
5
  1. use sys.argv to get the command-line parameters
  2. use open(), read() to manipulate file
  3. use the Python re module to match lines
jldupont
  • 93,734
  • 56
  • 203
  • 318
3

You might be interested in pyp. Citing my other answer:

"The Pyed Piper", or pyp, is a linux command line text manipulation tool similar to awk or sed, but which uses standard python string and list methods as well as custom functions evolved to generate fast results in an intense production environment.

Community
  • 1
  • 1
Piotr Dobrogost
  • 41,292
  • 40
  • 236
  • 366
3

You can use python-textops3 :

from textops import *

print('\n'.join(cat(f) | grep(search_term)))

with python-textops3 you can use unix-like commands with pipes

Eric
  • 4,821
  • 6
  • 33
  • 60
2

The real problem is that the variable line always has a value. The test for "no matches found" is whether there is a match so the code "if line == None:" should be replaced with "else:"

richard
  • 21
  • 1
1

Not sure if your question was clear to me but to fix your code just change your if expression like the following:

import re
import sys

search_term = sys.argv[1]
f = sys.argv[2]
r = None
n = 0
with open(f, 'r') as file:
    for line in file:
        n=n+1
        r = re.search(search_term, line)
        if r:
            print(f"{line} found at line {n}")
if not r:
    print('no matches found')

PS: I tested it on Python 3.8.10

if you want to use grep you could

grep -E '(.*)word(.*)' file.txt || echo "pattern not found"
Sergei Krivonos
  • 4,217
  • 3
  • 39
  • 54
brunocrt
  • 720
  • 9
  • 11