When calling the function re.match
specifically, the ^
character does have little meaning because this function begins the matching process at the beginning of the line. However, it does have meaning for other functions in the re module, and when calling match on a compiled regular expression object.
For example:
text = """\
Mares eat oats
and does eat oats
"""
print re.findall('^(\w+)', text, re.MULTILINE)
This prints:
['Mares', 'and']
With a re.findall()
and re.MULTILINE
enabled, it gives you the first word (with no leading whitespace) on each line of your text.
It might be useful if doing something more complex, like lexical analysis with regular expressions, and passing into the compiled regular expression a starting position in the text it should start matching at (which you can choose to be the ending position from the previous match). See the documentation for RegexObject.match method.
Simple lexer / scanner as an example:
text = """\
Mares eat oats
and does eat oats
"""
pattern = r"""
(?P<firstword>^\w+)
|(?P<lastword>\w+$)
|(?P<word>\w+)
|(?P<whitespace>\s+)
|(?P<other>.)
"""
rx = re.compile(pattern, re.MULTILINE | re.VERBOSE)
def scan(text):
pos = 0
m = rx.match(text, pos)
while m:
toktype = m.lastgroup
tokvalue = m.group(toktype)
pos = m.end()
yield toktype, tokvalue
m = rx.match(text, pos)
for tok in scan(text):
print tok
which prints
('firstword', 'Mares')
('whitespace', ' ')
('word', 'eat')
('whitespace', ' ')
('lastword', 'oats')
('whitespace', '\n')
('firstword', 'and')
('whitespace', ' ')
('word', 'does')
('whitespace', ' ')
('word', 'eat')
('whitespace', ' ')
('lastword', 'oats')
('whitespace', '\n')
This distinguishes between types of word; a word at the beginning of a line, a word at the end of a line, and any other word.