8

The following regex is supposed to match any :text: that's preceeded by start-of-string, whitespace or :, and succeeded by end-of-string, whitespace or : (Along with a few extra rules)

I'm not great at regex but I've come up with the desired solution in regexr.com:

(?<=\s|:|^)(:[^\s|:]+:)(?=\s|:|$)
:match1::match2: :match3:
:match4:
000:matchNot:
:matchNot:000
:match Not:

Result: :match1:, :match2:, :match3:, :match4:

But on Python 3 this raises an error.

re.search("(?<=\s|:|^)(:[^\s|:]+:)(?=\s|:|$)", txt)

re.error: look-behind requires fixed-width pattern

Anyone know a good workaround for this issue? Any tips are appreciated.

Nayncore
  • 233
  • 3
  • 6
  • Make sure you use `(?<![^\s:]):[^\s:]+:(?![^\s:])` and not `(?:^|(?<=[\s:]))(:[^\s:]+:)(?=[\s:]|$)`. – Wiktor Stribiżew Oct 09 '19 at 20:09
  • 2
    Does this answer your question? [Python Regex Engine - "look-behind requires fixed-width pattern" Error](https://stackoverflow.com/questions/20089922/python-regex-engine-look-behind-requires-fixed-width-pattern-error) – oguz ismail Feb 28 '21 at 20:34

3 Answers3

11

Possibly the easiest solution would be to use the newer regex module which supports infinite lookbehinds:

import regex as re

data = """:match1::match2: :match3:
:match4:
000:matchNot:
:matchNot:000
:match Not:"""

for match in re.finditer("(?<=\s|:|^)(:[^\s|:]+:)(?=\s|:|$)", data):
    print(match.group(0))

This yields

:match1:
:match2:
:match3:
:match4:
Jan
  • 42,290
  • 8
  • 54
  • 79
6

In python, you may use this work-around to avoid this error:

(?:^|(?<=[\s:]))(:[^\s:]+:)(?=[\s:]|$)

Anchors ^ and $ are zero-width matchers anyway.

RegEx Demo

anubhava
  • 761,203
  • 64
  • 569
  • 643
1

Another option would be to install regex:

$ pip3 install regex

then, we'd write some expression and (*SKIP)(*FAIL) the patterns that we wouldn't want to be there:

import regex as re

expression = r'(?:^\d+:[^:\r\n]+:$|^:[^:\r\n]+:\d+$|^(?!.*:\b\S+\b:).*$)(*SKIP)(*FAIL)|:[a-z0-9]+:'
string = '''
:match1::match2: :match3:
:match4:
000:matchNot:
:matchNot:000
:match Not:

'''

print(re.findall(expression, string))

Output

[':match1:', ':match2:', ':match3:', ':match4:']

If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.


RegEx Circuit

jex.im visualizes regular expressions:

enter image description here

Community
  • 1
  • 1
Emma
  • 27,428
  • 11
  • 44
  • 69