I have a string like
line = "... ... constant0 username@domain\r"
I need to extract domain
matchObj = re.match( 'constant\d+\s+(\w+)\@(\w+)', line, re.M|re.I)
matchObj
is always returning None
. What is that I'm missing here?
From the docs on re.match
:
Note that even in MULTILINE mode, re.match() will only match at the beginning of the string and not at the beginning of each line.
If you want to locate a match anywhere in string, use search() instead (see also search() vs. match()).
re.match()
does not locate a match anywhere in the string, whereas re.search()
will. You will want to use re.search()
(most of the time) instead.
Observe:
>>> import re
>>> line = "... ... constant0 username@domain\r"
>>> matchObj = re.match( 'constant\d+\s+(\w+)\@(\w+)', line, re.M|re.I)
>>> matchObj # None
>>> matchObj = re.search('constant\d+\s+(\w+)\@(\w+)', line, re.M|re.I)
>>> matchObj
<_sre.SRE_Match object at 0x10ce84470>
>>> print matchObj.group(0)
constant0 username@domain
>>> print matchObj.group(1)
username
Use re.search
not re.match
, re.match
is used to match at the start of the string.
Python offers two different primitive operations based on regular expressions:
re.match()
checks for a match only at the beginning of the string, whilere.search()
checks for a match anywhere in the string (this is what Perl does by default).
re.match
only matches at the beginning of the string. re.search
matches anywhere. As per the docs, don't just add a .*
at the beginning of re.match, use re.search
to get the optimization! (re.search
makes a quick loop in c to only check parts of the string that match the first character. If it starts with an ambiguous character, it has to go all the way to the end of the string and backtrack.)
And here is a gentler and (I think) generally better introduction to most of Python's regular expression functionality.