9

The command

re.compile(ur"(?<=,| |^)(?:next to|near|beside|opp).+?(?=,|$)", re.IGNORECASE)

throws a

sre_constants.error: look-behind requires fixed-width pattern

error in my program but regex101 shows it to be fine.

What I'm trying to do here is to match landmarks from addresses (each address is in a separate string) like:

  • "Opp foobar, foocity" --> Must match "Opp foobar"
  • "Fooplace, near barplace, barcity" --> Must match "near barplace"
  • "Fooplace, Shoppers Stop, foocity"--> Must match nothing
  • "Fooplace, opp barplace"--> Must match "opp barplace"

The lookbehind is to avoid matching words with opp in them (like in string 3).

Why is that error thrown? Is there an alternative to what I'm looking for?

Jongware
  • 22,200
  • 8
  • 54
  • 100
anupamGak
  • 95
  • 1
  • 3
  • 1
    Why: `` and `,` are 1-width, `^` is 0-width, and Python can't handle the mismatch. – Amadan Jun 15 '15 at 09:04
  • For the "why" see http://stackoverflow.com/a/30750398/2564301 – Jongware Jun 15 '15 at 09:04
  • 2
    https://regex101.com/r/lX0mL2/1 - "At start of string, or preceded by space or comma" `(?<=,| |^)` can be rewritten as "Not preceded by something that is not a space or a comma" `(?<![^ ,])` (always 1-width assertion, but since it's *negative* it will match at the beginning of the string as well). – Amadan Jun 15 '15 at 09:18

2 Answers2

8
re.compile(ur"(?:^|(?<=[, ]))(?:next to|near|beside|opp).+?(?=,|$)", re.IGNORECASE)

You can club 3 conditions using [] and |.See demo.

https://regex101.com/r/vA8cB3/2#python

vks
  • 67,027
  • 10
  • 91
  • 124
0

Use re.findall with the below regex, since re.findall must return the contents insdie the capturing group if there is any capturing group presents.

re.compile(ur"(?m)(?:[, ]|^)((?:next to|near|beside|opp).+?)(?:,|$)", re.IGNORECASE)
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274