-2

In Python:

Given a string 've, I can catch the start of the string with carat:

>>> import re
>>> s = u"'ve"
>>> re.match(u"^[\'][a-z]", s)
<_sre.SRE_Match object at 0x1109ee030>

So it matches even though the length substring after the single quote is > 1.

But for the dollar (matching end of string):

>>> import re
>>> s = u"'ve"
>>> re.match(u"[a-z]$", s)
>>> 

In Perl, from here

It seems like the end of string can be matched with:

$s =~ /[\p{IsAlnum}]$/

Is $s =~ /[\p{IsAlnum}]$/ the same as re.match(u"[a-z]$", s) ?

Why is the carat and dollar behavior different? And are they different for Python and Perl?

alvas
  • 115,346
  • 109
  • 446
  • 738

1 Answers1

3

re.match is implicitly anchored at the start of the string. Quoting the documentation:

re.match(pattern, string, flags=0)

If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding MatchObject instance.

Try re.search instead.

>>> import re
>>> s = u"'ve"
>>> re.search(u"[a-z]$", s)
<_sre.SRE_Match object at 0x7fea24df3780>
>>> 
Community
  • 1
  • 1
Robᵩ
  • 163,533
  • 20
  • 239
  • 308
  • Oh Thanks @Rob!! Tricky stuff between `re.search` vs `re.match`. – alvas Dec 22 '16 at 05:42
  • Just to be double sure. Is the regex matching in Perl is always the behavior of `re.search`? – alvas Dec 22 '16 at 05:57
  • 1
    @alvas Correct. Perl doesn't have a separate operator for matching only at the beginning of a string; for that you use anchors (`\A` or `^`). – ThisSuitIsBlackNot Dec 22 '16 at 06:10
  • Thanks @ThisSuitIsBlackNot !! BTW, the two of you have helped much in this contribution https://github.com/nltk/nltk/pull/1553 – alvas Dec 22 '16 at 06:35