How Regex engine parse anchors

Question

Can some explain how Regex engine works when it tries match

  ^4$ to 749\n486\n4

I am mean how Regex engine parse string While performing match

have a look at http://stackoverflow.com/questions/525004/short-example-of-regular-expression-converted-to-a-state-machine — Ahmed Masud, May 22 '13 at 02:39

score 0 · Answer 1 · answered May 22 '13 at 07:31

The regexp ^4$ means match a line that only contains a digit 4

If you apply this regexp to a string that contains newline characters then it will treat the first character of the string as the start of the line and the first newline as the end of the line. Additional characters after the newline are effectively ignored. Example in perl

  DB<1> $str="749\n486\n4";
  DB<2> x $str =~ /^4$/
  empty array

example in python

>>> import re
>>> s="749\n486\n4"
>>> re.search('^4$',s)

However, regexp implementations have a way of dealing with this. There is a multiline setting. In perl

  DB<3> x $str =~ /^4$/m
0  1

In python

>>> re.search('^4$',s,re.MULTILINE)
<_sre.SRE_Match object at 0x7f446874b030>

The python docs explain multiline mode like this

re.MULTILINE When specified, the pattern character '^' matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character '$' matches at the end of the string and at the end of each line (immediately preceding each newline). By default, '^' matches only at the beginning of the string, and '$' only at the end of the string and immediately before the newline (if any) at the end of the string.

If in your multiline string you actually wanted to know if it ended in a digit 4 on a single line then there is a syntax feature for this

  DB<4> x $str =~ /^4\z/m
0  1

See http://perldoc.perl.org/perlre.html especially on the m flag and \a, \z, \Z or http://docs.python.org/2/library/re.html#regular-expression-objects

How Regex engine parse anchors

1 Answers1