0

Generally lookahead assertion follows some other pattern like:

Isaac(?!Asimov)

But what does it mean when it appears at the beginning of pattern?

Here is an example (the simplename pattern is from docutils):

#!/usr/bin/env python
import re

regex = re.compile(r'(?P<simplename>(?:(?!_)\w)+(?:[-._+:](?:(?!_)\w)+)*)')
input = ['_1_2_name', '1_2_name', 'name_1_2', ':123', 'name:s:s_dd', 'name_-+.sdf']
for i in input:
    match = regex.match(i)
    if match:
        print "MATCH: ", match.group('simplename')
    else:
        print "NOT MATCH: %s" % i

This regex will correctly match variable names that don't start with _, but it doesn't work when using libpcre to run the same regex. Python will not match "_1_2_name" but libpcre will match it and get "1_2_name" as a result.

I guess Python matches the position of beginning but libpcre doesn't. What is the difference between Python and PCRE? What is the equivalent pattern using PCRE?

ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152
yuuzu ka
  • 54
  • 2
  • Just tried the above with python on regex101 and python does not match the underscore as well. https://regex101.com/r/jV4YeO/3. Go ahead and try it, switch between the pcre and python and you will not see a difference in matching. I also tried your code in the python interpreter and I get the same results. – smac89 Nov 29 '16 at 06:19
  • I call pcre_exec() with default options so it works like re.search(), that's why it match "_1_2_name". PCRE_ANCHORED is needed to make it only match from the beginning. – yuuzu ka Nov 29 '16 at 06:39
  • The regex will work the same way in both Python `re` and PCRE. The only difference is the usage that you confirmed in the comment above. `re.match` only searches for a match *at the string start*. – Wiktor Stribiżew Nov 29 '16 at 07:38
  • Damn, the misleading title tricked me. Half an hour wasted on an answer that has nothing to do with the real question. – ivan_pozdeev Dec 02 '16 at 18:05

0 Answers0