0

I have a big file. And I have n regexes.

I want to match all n regexes against the file but by going over the file only once. So here is my pseudo code:

Here I run the loop for all regexes for every line.

f = open("file.txt")
for line in f:
   for regex in regexes:
      m = re.search(regex, line)
      if m is not None:
          # do something

Another pseudo code, where I write n if-elif statements

f = open("file.txt")
for line in f:
     if re.search(regex1, line)
        # do something1
     elif re.search(regex2, line)
        # do something2
     elif re.search(regex3, line)
        # do something3 
     ...
     else:
        pass

I don't like either approaches. What is a better way to do this in python?

Ankur Agarwal
  • 23,692
  • 41
  • 137
  • 208
  • Can you give an example of a line to match and the regexes you would match it against? – cs95 Jun 30 '17 at 17:53
  • A dictionary: keys are your regexes and values are the functions to apply given a regex. – Abdou Jun 30 '17 at 17:53
  • Is it important for you to know **which** regexp matched, or just that there was a match? – Błotosmętek Jun 30 '17 at 17:57
  • @Błotosmętek It is important to know which matched. Do my code examples not make that clear ? If not, I can update them. – Ankur Agarwal Jun 30 '17 at 17:59
  • @abc, These somethings of yours need to be defined as function and get mapped to their regexes via a python dictionary. That doesn't sound like a better idea to you? – Abdou Jun 30 '17 at 18:16
  • @Abdou Yes that is one other approach. But how do I avoid the loop for each line ? – Ankur Agarwal Jun 30 '17 at 18:34
  • If you want to both know the matched regex and have access to the corresponding function/operation, you will have to have a nested loop. – Abdou Jun 30 '17 at 18:47
  • 1
    I've never actually used named regex group names, so I can't give solid advice, but this question might be helpful. https://stackoverflow.com/questions/10059673/named-regular-expression-group-pgroup-nameregexp-what-does-p-stand-for – aberger Jul 03 '17 at 14:06

1 Answers1

-1

Use one regex to rule them all:

new_regex = '(' + ')|('.join(my_list_of_regexes) + ')'

You probably wanna use string formatting instead of the +s but you get the point.

Re.po
  • 214
  • 1
  • 7