Is the behavior of Python vs Perl regex for carat and dollar different?

Question

In Python:

Given a string 've, I can catch the start of the string with carat:

>>> import re
>>> s = u"'ve"
>>> re.match(u"^[\'][a-z]", s)
<_sre.SRE_Match object at 0x1109ee030>

So it matches even though the length substring after the single quote is > 1.

But for the dollar (matching end of string):

>>> import re
>>> s = u"'ve"
>>> re.match(u"[a-z]$", s)
>>>

In Perl, from here

It seems like the end of string can be matched with:

$s =~ /[\p{IsAlnum}]$/

Is $s =~ /[\p{IsAlnum}]$/ the same as re.match(u"[a-z]$", s) ?

Why is the carat and dollar behavior different? And are they different for Python and Perl?

`^[\'][a-z]` is better written `^'[a-z]` and `[\p{IsAlnum}]$` is better written `\p{IsAlnum}$`. — ThisSuitIsBlackNot, Dec 22 '16 at 06:01

score 3 · Accepted Answer · edited Jun 20 '20 at 09:12

3

re.match is implicitly anchored at the start of the string. Quoting the documentation:

re.match(pattern, string, flags=0)

If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding MatchObject instance.

Try re.search instead.

>>> import re
>>> s = u"'ve"
>>> re.search(u"[a-z]$", s)
<_sre.SRE_Match object at 0x7fea24df3780>
>>>

edited Jun 20 '20 at 09:12

Community

1
1

answered Dec 22 '16 at 04:03

Robᵩ

163,533
20
239
308

Oh Thanks @Rob!! Tricky stuff between `re.search` vs `re.match`. – alvas Dec 22 '16 at 05:42
Just to be double sure. Is the regex matching in Perl is always the behavior of `re.search`? – alvas Dec 22 '16 at 05:57
1

@alvas Correct. Perl doesn't have a separate operator for matching only at the beginning of a string; for that you use anchors (`\A` or `^`). – ThisSuitIsBlackNot Dec 22 '16 at 06:10
Thanks @ThisSuitIsBlackNot !! BTW, the two of you have helped much in this contribution https://github.com/nltk/nltk/pull/1553 – alvas Dec 22 '16 at 06:35

Is the behavior of Python vs Perl regex for carat and dollar different?

1 Answers1