3

I get deep indentation when I write code like below

match = re.search(some_regex_1, s)
if match:
    # do something with match data
else:
    match = re.search(some_regex_2, s)
    if match:
        # do something with match data
    else:
        match = re.search(soem_regex_3, s)
        if match:
            # do something with match data
        else:
            # ...
            # and so on

I tried to rewrite as:

if match = re.search(some_regex_1, s):
    # ...
elif match = re.search(some_regex_2, s):
    # ...
elif ....
    # ...
...

but Python doesn't allow that syntax. What should I do to avoid deep indentation in this case?

Le Curious
  • 1,451
  • 1
  • 13
  • 13
  • The answers below address this specific case, but, generally, it's important to note that you need `if a == b: ... ` in Python, not just one equals sign as you have above. Or, depending on the scenario, `if a is b: ...` The `elif` statements above work in theory, but you need the equals-equals. – Karmel May 21 '12 at 18:45
  • 1
    @Karmel, I think he's trying to assign and test `match` for truthness at the same time, like folks might do in C: `while (data=fread(fp)) {` – JoeFish May 21 '12 at 18:51
  • Good point. I thought is was a more general assumption. My apologies, @Le Curious :) – Karmel May 21 '12 at 18:55

3 Answers3

6
regexes = (regex1, regex2, regex3)
for regex in regexes:
    match = re.search(regex, s)
    if match:
        #do stuff
        break

Alternatively (more advanced):

def process1(match_obj):
    #handle match 1

def process2(match_obj):
    #handle match 2

def process3(match_obj):
    #handle match 3
.
.
.
handler_map = ((regex1, process1), (regex2, process2), (regex3, process3))
for regex, handler in handler_map:
    match = re.search(regex, s)
    if match:
        result = handler(match)
        break
else:
    #else condition if no regex matches
Silas Ray
  • 25,682
  • 5
  • 48
  • 63
  • Nice! You probably want to enumerate the regexes so you can keep track of which regex was matched though. `for i, regex in enumerate(regexes): ...` – Junuxx May 21 '12 at 18:39
  • @Junuxx, wasn't in the spec, but sure, could be useful in some cases. You might also just want to know that something was matched to handle the final else condition. – Silas Ray May 21 '12 at 18:40
  • I assumed that the 'do stuff' could be different depending on which regex it matched and you don't want to test it again :p – Junuxx May 21 '12 at 18:41
  • I suggest to add variable i, so you would be able to easily check which regex has matched. After iteration if i == num of regex, then no regex has matched. – kravemir May 21 '12 at 18:42
  • Oh yeah, I see your points. Though then you end up with another if/elif/else construct... let me put in something better. – Silas Ray May 21 '12 at 18:42
  • 3
    Instead of 'if not result' you can use an 'else' block in the for loop. – Kamil Kisiel May 21 '12 at 18:51
  • 1
    @KamilKisiel Good call, changed. – Silas Ray May 21 '12 at 18:53
2

If you can use finditer() instead of search() (most of the time you can), you could join all your regexes into one and use symbolic group names. Here is an example:

import re

regex = """
   (?P<number> \d+ ) |
   (?P<word> \w+ ) |
   (?P<punctuation> \. | \! | \? | \, | \; | \: ) |
   (?P<whitespace> \s+ ) |
   (?P<eof> $ ) |
   (?P<error> \S )
"""

scan = re.compile(pattern=regex, flags=re.VERBOSE).finditer

for match in scan('Hi, my name is Joe. I am 1 programmer.'):
    token_type = match.lastgroup
    if token_type == 'number':
        print 'found number "%s"' % match.group()
    elif token_type == 'word':
        print 'found word "%s"' % match.group()
    elif token_type == 'punctuation':
        print 'found punctuation character "%s"' % match.group()
    elif token_type == 'whitespace':
        print 'found whitespace'
    elif token_type == 'eof':
        print 'done parsing'
        break
    else:
        raise ValueError('String kaputt!')
pillmuncher
  • 10,094
  • 2
  • 35
  • 33
0
if re.search(some_regex_1, s) is not None:
    # ...
elif re.search(some_regex_2, s) is not None:
    # ...
elif ....
    # ...
...

search() returns None if there is no match found, so in your if statement it will proceed to the next test.

Andrew Sledge
  • 10,163
  • 2
  • 29
  • 30