5

So I get some input in python that I need to parse using regexps.

At the moment I'm using something like this:

matchOK = re.compile(r'^OK\s+(\w+)\s+(\w+)$')
matchFailed = re.compile(r'^FAILED\s(\w+)$')
#.... a bunch more regexps

for l in big_input:
  match = matchOK.search(l)
  if match:
     #do something with match
     continue
  match = matchFailed.search(l)
  if match:
     #do something with match
     continue
  #.... a bunch more of these 
  # Then some error handling if nothing matches

Now usually I love python because its nice and succinct. But this feels verbose. I'd expect to be able to do something like this:

for l in big_input:      
  if match = matchOK.search(l):
     #do something with match     
  elif match = matchFailed.search(l):
     #do something with match 
  #.... a bunch more of these
  else
    # error handling

Am I missing something, or is the first form as neat as I'm going to get?

SiggyF
  • 22,088
  • 8
  • 43
  • 57
Michael Anderson
  • 70,661
  • 7
  • 134
  • 187
  • 2
    Duplicate to http://stackoverflow.com/questions/2554185/match-groups-in-python and http://stackoverflow.com/questions/122277/how-do-you-translate-this-regular-expression-idiom-from-perl-into-python ? – Curd Apr 01 '11 at 08:47
  • 1
    I think your first approach is clear enough and will be easy to grok a year from now. Personally, I would change the name of matchOK and matchFailed to patOK and patFailed because they are pattern objects, not match objects. I suspect you are overusing regular expressions -- my approach would be to use `if l.startswith('OK '):` and if `l.startswith('FAILED '):`, etc. – Steven Rumbalski Apr 01 '11 at 08:48
  • @Curd Yep it seems that the first of those is almost equivalent and its answer seems like the best. – Michael Anderson Apr 01 '11 at 09:22
  • @Steven Rumbalski This is a simplification. The real regexps are significantly nastier. – Michael Anderson Apr 01 '11 at 09:26
  • This is a very small point, but you do not actually have to keep the patterns around; you can just do something at the top of your file like `searchOK = re.compile(r'^OK\s+(\w+)\s+(\w+)$').search` and then later say `match = searchOK(string)`. – Brandon Rhodes Apr 01 '11 at 14:24

4 Answers4

3
class helper:
    def __call__(self, match):
        self.match= match
        return bool(match)

h= helper()
for l in big_input:      
    if h(matchOK.search(l)):
        # do something with h.match     
    elif h(matchFailed.search(l)):
        # do something with h.match 
    ... # a bunch more of these
    else:
        # error handling

Or matchers as class methods:

class matcher:
    def __init__(self):
        # compile matchers
        self.ok= ...
        self.failed= ...
        self....= ...

    def matchOK(self, l):
        self.match= self.ok(l)
        return bool(self.match)

    def matchFailed(self, l):
        self.match= self.failed(l)
        return bool(self.match)

    def match...(self, l):
        ...

m= matcher()
for l in big_input:      
    if m.matchOK(l):
        # do something with m.match     
    elif m.matchFailed(l):
        # do something with m.match 
    ... # a bunch more of these
    else:
        # error handling
eat
  • 7,440
  • 1
  • 19
  • 27
  • You need colons after your `if` and `else` clauses; and, you do not need to compare your match against `None` because, according to the docs, “Match Objects always have a boolean value of True, so that you can test whether e.g. match() resulted in a match with a simple if statement.” http://docs.python.org/library/re.html#match-objects – Brandon Rhodes Apr 01 '11 at 09:58
  • @Brandon: Clearly this is not a implementation rather rough pseudo code to demonstration the ideas! Yes, the actual implementation can be streamlined, here the `None` treatment is just emphasizing the point. Thanks – eat Apr 01 '11 at 10:35
  • You could “emphasize the point” in one line rather than three by replacing the big `if`-`else` constructs with `return bool(match)` which still tells the Python programmer unambiguously that you are returning a value to be used as a true/false decision, but using much less code. – Brandon Rhodes Apr 01 '11 at 14:21
  • @Brandon: fair enough suggestion. Thanks – eat Apr 01 '11 at 14:31
0

How about something like:

for l in big_input:
    for p in (matchOK, matchFailed): # other patterns go in here
        match = p.search(l)
        if match: break
    if (not match): p = None # no patterns matched
    if (p is matchOK):
        # do something with match
    elif (p is matchFailed):
        # do something with match
    #.... a bunch more of these 
    else:
        assert p is None
        # Then some error handling if nothing matches
Tom Anderson
  • 46,189
  • 17
  • 92
  • 133
0

And something like that ? :

import re


def f_OK(ch):
    print 'BINGO ! : %s , %s' % re.match('OK\s+(\w+)\s+(\w+)',ch).groups()

def f_FAIL(ch):
    print 'only one : ' + ch.split()[-1]

several_func = (f_OK, f_FAIL)


several_REs = ('OK\s+\w+\s+\w+',
               'FAILED\s+\w+')

globpat = re.compile(')|('.join(several_REs).join(('^(',')$')))




with open('big_input.txt') as handle:
    for i,line in enumerate(handle):
        print 'line '+str(i)+' - ',
        mat = globpat.search(line)
        if mat:
            several_func[mat.lastindex-1](mat.group())
        else:
            print '## no match ## '+repr(line)

I tried it on a file whose content is:

OK tiramisu sunny   
FAILED overclocking   
FAILED nuclear    
E = mcXc    
OK the end  

the result is

line 0 -  BINGO ! : tiramisu , sunny
line 1 -  only one : overclocking
line 2 -  only one : nuclear
line 3 -  ## no match ## 'E = mcXc\n'
line 4 -  BINGO ! : the , end

This allow you to define quantities of REs and functions separatly, to add some, to remove some, etc

eyquem
  • 26,771
  • 7
  • 38
  • 46
-1

Even better, how about a slightly simpler version of eat's code using a nested function:

import re

matchOK = re.compile("ok")
matchFailed = re.compile("failed")
big_input = ["ok to begin with", "failed later", "then gave up"]

for l in big_input:
    match = None
    def matches(pattern):
        global match
        match = pattern.search(l)
        return match
    if matches(matchOK):
        print "matched ok:", l, match.start()
    elif matches(matchFailed):
        print "failed:", l, match.start()
    else:
        print "ignored:", l

Note that this will work if the loop is part of the top level of the code, but is not easily converted into a function - the variable match still has to be a true global at the top level.

Tom Anderson
  • 46,189
  • 17
  • 92
  • 133
  • Using `!=` to test object identity is generally considered bad form. – Brandon Rhodes Apr 01 '11 at 09:54
  • @Brandon: i'm not using it to test identity, i'm using it to test non-noneness, which is a rather specific kind of identity. Either way, i have never come across the idea that using != in this way is a problem - could you point me at something i could read about this? – Tom Anderson Apr 01 '11 at 09:56
  • “Comparisons to singletons like None should always be done with 'is' or 'is not', never the equality operators.” — http://www.python.org/dev/peps/pep-0008/ – Brandon Rhodes Apr 01 '11 at 10:00
  • Also: http://jaredgrubb.blogspot.com/2009/04/python-is-none-vs-none.html and read *all* of the answers to http://stackoverflow.com/questions/3257919/is-none-vs-none – Brandon Rhodes Apr 01 '11 at 10:01
  • And, anyway, you do not need to compare your match against `None` because, according to the docs, “Match Objects always have a boolean value of True, so that you can test whether e.g. match() resulted in a match with a simple if statement.” http://docs.python.org/library/re.html#match-objects – Brandon Rhodes Apr 01 '11 at 10:02
  • @Brandon: fair enough. Fixed. – Tom Anderson Apr 01 '11 at 10:04
  • Looks much better! Now the only problem is that the `match =` assignment inside the function won't actually affect the value of `match` out in the parent function's scope because of Python's scoping rules (try it!). :) – Brandon Rhodes Apr 01 '11 at 10:08
  • @Brandon: perhaps you could give it a go: http://pastebin.com/jKwfGLNK - works for me with 2.6.4; is this something that's changed in 3.0? Or are you thinking of assignments not binding to *global* names (without the `global` qualifier)? – Tom Anderson Apr 01 '11 at 10:17
  • @Brandon: also, i read the links you posted, and i have to say i remain unconvinced. As the commenter on Jared Grubb's post says, if an object defines itself as being equal to None, why shouldn't you respect that? – Tom Anderson Apr 01 '11 at 10:19
  • @Tom: if you re-check the question, the code blocks beneath the `if` statements say “do something with match” — he wants to get the `match` object and call `.group(x)` or something on it. If you will simply update the `print` statements in your PasteBin example so that they actually try to examine `match`, you will see that it remains `None` out in the scope outside of the `matches()` function. – Brandon Rhodes Apr 01 '11 at 14:15
  • @Brandon: ah bugrit. I worked this out in the interpreter, then rewrote it here without checking it. It's missing a global declaration on match - then it works. But it will only work as code at the top level, not inside a function. – Tom Anderson Apr 01 '11 at 14:42
  • @Tom: agreed. And my assumption was that this code would wind up inside a function. So I think I like the other approaches we are seeing here, where the match is kept as an object attribute or something. – Brandon Rhodes Apr 01 '11 at 14:58