I have a text file that looks like this (close to 1,500,000 lines with ~5-120 words per line of varying length):
This is a foo bar sentence.
What are you sure a foo bar? or a foo blah blah.
blah blah foo sheep have you any bar?
...
I want to search for lines that contains a phrase (max 10,000 line), let's say foo bar
. So in python, i wrote this:
import os
cmd = 'grep -m 10,000 "'+frag+'" '+deuroparl + " > grep.tmp"
os.system(cmd)
results = [i for i in open('grep.tmp','r').readlines()]
What is the "proper" way to do it without cheating with grep
?
Will it be faster than grep
(see How does grep run so fast?)?
Is there a faster way to do this?