I am trying to parse a large sample of text files with regular expressions (RE). I am trying to extract from these files the part of the text which contains 'vu' and ends with a newline '\n'.
Patterns differ from one file to another, so I tried to look for combinations of RE in my files using the OR operator. However, I did not find a way to automate my code so that the re.findall() function looks for a combination of RE.
Here is an example of how I tried to tackle this issue, but apparently I still can not evaluate both my regular expressions and the OR operator in re.findall():
import re
def series2string(myserie) :
myserie2 = ' or '.join(serie for serie in myserie)
return myserie2
def expression(pattern, mystring) :
x = re.findall(pattern, mystring)
if len(x)>0:
return 1
else:
return 0
#text example
text = "\n\n (troisième chambre)\n i - vu la requête, enregistrée le 28 février 1997 sous le n° 97nc00465, présentée pour m. z... farinez, demeurant ... à dommartin-aux-bois (vosges), par me y..., avocat ;\n"
#expressions to look out
pattern1 = '^\s*vu.*\n'
pattern2 = '^\s*\(\w*\s*\w*\)\s*.*?vu.*\n'
pattern = [pattern1, pattern2]
pattern = series2string(pattern)
expression(pattern, text)
Note : I circumvented this problem by looking for each pattern in a for loop but my code would run faster if I could just use re.findall() once.