24

The pattern (?<!(asp|php|jsp))\?.* works in PCRE, but it doesn't work in Python.

So what can I do to get this regex working in Python? (Python 2.7)

vvvvv
  • 25,404
  • 19
  • 49
  • 81
Matt Elson
  • 4,177
  • 8
  • 31
  • 46
  • 6
    How does it not work? Does it give an error? If so, post the error. Does it not match what you expect? If so, post the code where you use it and show the output you get vs. what you expect. – BrenBarn Dec 19 '12 at 08:07
  • Here is the [doc](http://docs.python.org/2/library/re.html) for the re module, clearly state support of negative lookbehind assertions. – Chris Seymour Dec 19 '12 at 08:17
  • 3
    Negative lookbehinds work in re as long as all alternatives have the same length. So this works `(?<!asp|php|jsp)`, but not this `(?<!asp|php|html)`. – georg Dec 19 '12 at 08:28
  • @georg how would you accomodate for different length strings with lookbehinds? – physlexic Oct 26 '17 at 21:11
  • 1
    @physlexic: regex (https://pypi.python.org/pypi/regex/) can do this – georg Oct 27 '17 at 08:42

1 Answers1

25

It works perfectly fine for me. Are you maybe using it wrong? Make sure to use re.search instead of re.match:

>>> import re
>>> s = 'somestring.asp?1=123'
>>> re.search(r"(?<!(asp|php|jsp))\?.*", s)
>>> s = 'somestring.xml?1=123'
>>> re.search(r"(?<!(asp|php|jsp))\?.*", s)
<_sre.SRE_Match object at 0x0000000002DCB098>

Which is exactly how your pattern should behave. As glglgl mentioned, you can get the match if you assign that Match object to a variable (say m) and then call m.group(). That yields ?1=123.

By the way, you can leave out the inner parentheses. This pattern is equivalent:

(?<!asp|php|jsp)\?.*
Martin Ender
  • 43,427
  • 11
  • 90
  • 130