how to split very long regular expression in python

Question

i have a regular expression which is very long.

 vpa_pattern = '(VAP) ([0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}): (.*)'

My code to match group as follows:

 class ReExpr:
def __init__(self):
    self.string=None

def search(self,regexp,string):
    self.string=string
    self.rematch = re.search(regexp, self.string)
    return bool(self.rematch)

def group(self,i):
    return self.rematch.group(i)

 m = ReExpr()

 if m.search(vpa_pattern,line):
    print m.group(1)
    print m.group(2)
    print m.group(3)

I tried to make the regular expression pattern to multiple line in following ways,

vpa_pattern = '(VAP) \
    ([0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}):\
    (.*)'

Or Even i tried:

 vpa_pattern = re.compile(('(VAP) \
    ([0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}):\
    (.*)'))

But above methods are not working. For each group i have a space () after open and close parenthesis. I guess it is not picking up when i split to multiple lines.

What about simpler regex like `(VAP) ((?:[0-9A-Fa-f]{2}:){5}) (.*)`? — Kiro, May 28 '14 at 12:41

score 8 · Answer 1 · answered May 28 '14 at 12:36

8

Look at re.X flag. It allows comments and ignores white spaces in regex.

a = re.compile(r"""\d +  # the integral part
               \.    # the decimal point
               \d *  # some fractional digits""", re.X)

answered May 28 '14 at 12:36

Alex Shkop

1,992
12
12

+1 And it should also be noted that Python's `r"""raw multi-line string"""` syntax (used here) makes writing these self-documenting regexes much easier (because it completely avoids any backslash soup confusion). – ridgerunner May 28 '14 at 13:27

score 3 · Answer 2 · edited Jan 21 '15 at 17:22

3

Python allows writing text strings in parts if enclosed in parenthesis:

>>> text = ("alfa" "beta"
... "gama")
...
>>> text
'alfabetagama'

or in your code:

text = ("alfa" "beta"
        "gama" "delta"
        "omega")
print text

will print

"alfabetagamadeltaomega"

edited Jan 21 '15 at 17:22

Michael Myers

188,989
46
291
292

answered May 28 '14 at 12:36

Jan Vlcinsky

42,725
12
101
98

score 1 · Answer 3 · edited May 23 '17 at 11:58

1

Its actually quite simple. You already use the {} notation. Use it again. So instead of:

'([0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}):'

which is just a repeat of [0-9A-Fa-f]{2}: 6 times, you can use:

'([0-9A-Fa-f]{2}:){6}'

We can even simplify it further by using \d to represent digits:

'([\dA-Fa-f]{2}:){6}'

NOTE: Depending on what re function you use, you can pass in re.IGNORE_CASE and simplify that chunk down to [\da-f]{2}:

So your final regex is:

'(VAP) ([\dA-Fa-f]{2}:){6} (.*)'

edited May 23 '17 at 11:58

Community

1
1

answered May 28 '14 at 12:47

BeetDemGuise

954
7
11

A repeating group only captures the last repetition. Instead, use a repeating non-capturing group inside a capturing group. Note also that OP's regex does not capture the last colon. – Janne Karila May 28 '14 at 13:04
If the OPs regex doesn't capture the final `:` then what is the `:` here: `'...[0-9A-Fa-f]{2}): (.*)'` doing? – BeetDemGuise May 28 '14 at 13:25
The `()` define a group which the OP accesses as `m.group(2)`. The last `:` is outside the paretheses. – Janne Karila May 28 '14 at 14:40
I see. They'll both recognize the same strings, though it seems the group structure may be different. – BeetDemGuise May 28 '14 at 14:46

how to split very long regular expression in python

3 Answers3