14

My dilemma: I'm passing my function a string that I need to then perform numerous regex manipulations on. The logic is if there's a match in the first regex, do one thing. If no match, check for a match with the second and do something else, if not check the third, and so forth. I could do something like this:

if re.match('regex1', string):
    match = re.match('regex1', string)
    # Manipulate match.group(n) and return
elif re.match('regex2', string):
    match = re.match('regex2', string)
    # Do second manipulation
[etc.]

However, this feels unnecessarily verbose, and usually when that's the case it means there's a better way that I'm either overlooking or don't yet know about.

Does anyone have a suggestion for a better way to do this (better from a code-appearance standpoint, a memory usage standpoint, or both)?

One Crayon
  • 19,119
  • 11
  • 33
  • 40
  • Dupicate: http://stackoverflow.com/questions/122277/how-do-you-translate-this-regular-expression-idiom-from-perl-into-python – S.Lott Feb 28 '09 at 12:11

7 Answers7

24

Generally speaking, in these sorts of situations, you want to make the code "data driven". That is, put the important information in a container, and loop through it.

In your case, the important information is (string, function) pairs.

import re

def fun1():
    print('fun1')

def fun2():
    print('fun2')

def fun3():
    print('fun3')

regex_handlers = [
    (r'regex1', fun1),
    (r'regex2', fun2),
    (r'regex3', fun3)
    ]

def example(string):
    for regex, fun in regex_handlers:
        if re.match(regex, string):
            fun()  # call the function
            break

example('regex2')
dan-gph
  • 16,301
  • 12
  • 61
  • 79
  • Thanks for this suggestion! This was what I was probably going to end up doing, but the overridden version of the re module is a slightly better fit for this project. – One Crayon Feb 28 '09 at 16:23
13

Similar question from back in september: How do you translate this regular-expression idiom from Perl into Python?

Using global variables in a module maybe not the best way to do it, but converting it into a class:

import re

class Re(object):
  def __init__(self):
    self.last_match = None
  def match(self,pattern,text):
    self.last_match = re.match(pattern,text)
    return self.last_match
  def search(self,pattern,text):
    self.last_match = re.search(pattern,text)
    return self.last_match

gre = Re()
if gre.match(r'foo',text):
  # do something with gre.last_match
elif gre.match(r'bar',text):
  # do something with gre.last_match
else:
  # do something else
Community
  • 1
  • 1
Markus Jarderot
  • 86,735
  • 21
  • 136
  • 138
  • Thanks for the link! I didn't find that topic in my search, but it's spot on for what I'm trying to do. I like the idea of using a class rather than a module, too. – One Crayon Feb 28 '09 at 16:22
2

I had the same problem as yours. Here´s my solution:

import re

regexp = {
    'key1': re.compile(r'regexp1'),
    'key2': re.compile(r'regexp2'),
    'key3': re.compile(r'regexp3'),
    # ...
}

def test_all_regexp(string):
    for key, pattern in regexp.items():
        m = pattern.match(string)
        if m:
            # do what you want
            break

It´s a slightly modified solution from the answer of Extracting info from large structured text files

Community
  • 1
  • 1
Luiz Damim
  • 3,803
  • 2
  • 27
  • 31
1

Hmm... you could use something with the with construct... um

class rewrapper()
    def __init__(self, pattern, target):
        something

    def __enter__(self):
        something

    def __exit__(self):
        something


 with rewrapper("regex1", string) as match:
    etc

 with rewrapper("regex2", string) as match: 
    and so forth
SingleNegationElimination
  • 151,563
  • 33
  • 264
  • 304
0

Are the manipulations for each regex similar? If so, try this:

for regex in ('regex1', 'regex2', 'regex3', 'regex4'):
    match = re.match(regex, string)
    if match:
        # Manipulate match.group(n)
        return result
Seun Osewa
  • 4,965
  • 3
  • 29
  • 32
  • Unfortunately the manipulations vary for the different regex; in retrospect, I should have specified that in the question. – One Crayon Feb 28 '09 at 05:05
0

Here your regexs and matches are not repeated twice:

match = re.match('regex1', string)
if match:
    # do stuff
    return

match = re.match('regex2', string)
if match:
    # do stuff
    return
Daniel Von Fange
  • 5,973
  • 3
  • 26
  • 23
0
class RegexStore(object):
   _searches = None

   def __init__(self, pat_list):
      # build RegEx searches
      self._searches = [(name,re.compile(pat, re.VERBOSE)) for
                        name,pat in pat_list]

   def match( self, text ):
      match_all = ((x,y.match(text)) for x,y in self._searches)
      try:
         return ifilter(op.itemgetter(1), match_all).next()
      except StopIteration, e:
         # instead of 'name', in first arg, return bad 'text' line
         return (text,None)

You can use this class like so:

rs = RegexStore( (('pat1', r'.*STRING1.*'),
                  ('pat2', r'.*STRING2.*')) )
name,match = rs.match( "MY SAMPLE STRING1" )

if name == 'pat1':
   print 'found pat1'
elif name == 'pat2':
   print 'found pat2'
cmcginty
  • 113,384
  • 42
  • 163
  • 163