Inspired by the answer of falsetru, I rewrote my code, making it more generic.
Now the files to explore :
can be described either by a string as second argument that will be used by glob()
,
or by a function specifically written for this goal in case the set of desired files can't be described with a glob
ish pattern
and may be in the current directory if no third argument is passed,
or in a specified directory if its path is passed as a second argument
.
import re,glob
from itertools import ifilter
from os import getcwd,listdir,path
from inspect import isfunction
regx = re.compile('^[^\n]*word1.*?word3.*?$',re.S|re.M)
G = '\n\n'\
'MWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMW\n'\
'MWMWMW %s\n'\
'MWMWMW %s\n'\
'%s%s'
def search(REGX, how_to_find_files, dirpath='',
G=G,sepm = '\n======================\n'):
if dirpath=='':
dirpath = getcwd()
if isfunction(how_to_find_files):
gen = ifilter(how_to_find_files,
ifilter(path.isfile,listdir(dirpath)))
elif isinstance(how_to_find_files,str):
gen = glob.glob(path.join(dirpath,
how_to_find_files))
for fn in gen:
with open(fn) as fp:
found = REGX.findall(fp.read())
if found:
yield G % (dirpath,path.basename(fn),
sepm,sepm.join(found))
# Example of searching in .txt files
#============ one use ===================
def select(fn):
return fn[-4:]=='.txt'
print ''.join(search(regx, select))
#============= another use ==============
print ''.join(search(regx,'*.txt'))
The advantage of chaining the treatments of sevral files through succession of generators is that the final joining with ''.join()
creates a unique string that is instantly written,
while, if not so processed, the printing of several individual strings one after the other is longer because of the interrupts of displaying (am I understandable ?)