2

I have this code:

import re

regex = re.compile("(.+?)\1+")
results = regex.findall("FFFFFFF")
print(results)

The expected result is:

['F']

According to regexpal, the regex is doing what it is supposed to do (finding the shortest repeated substring). But when trying the regex in python the result is []. Why is this?

The Guy with The Hat
  • 10,836
  • 8
  • 57
  • 75
l.k.1234
  • 78
  • 6

3 Answers3

4

Try

regex = re.compile(r"(.+?)\1+")

Why didn't it work? You can understand by

print r"(.+?)\1+"
print "(.+?)\1+"

And see What does preceding a string literal with "r" mean?

Community
  • 1
  • 1
emesday
  • 6,078
  • 3
  • 29
  • 46
4

Use raw strings:

>>> re.findall("(.+?)\1+", "FFFFFFF")
[]
>>> re.findall(r"(.+?)\1+", "FFFFFFF")
['F']
>>> 

Raw string literals, i.e. string literal prefixed with 'r', make the backslashes to be treated as literal. Backslashes are otherwise treated as escape sequences.

Quoting from re — Regular expression operations:

Regular expressions use the backslash character ('\') to indicate special forms or to allow special characters to be used without invoking their special meaning. ...

The solution is to use Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with 'r'. So r"\n" is a two-character string containing '\' and 'n', while "\n" is a one-character string containing a newline. Usually patterns will be expressed in Python code using this raw string notation.

devnull
  • 118,548
  • 33
  • 236
  • 227
3

Use a raw string:

regex = re.compile(r"(.+?)\1+")

or escape the backslash:

regex = re.compile(r"(.+?)\\1+")
Barmar
  • 741,623
  • 53
  • 500
  • 612