94

I get an error message when I use this expression:

re.sub(r"([^\s\w])(\s*\1)+","\\1","...")

I checked the regex at RegExr and it returns . as expected. But when I try it in Python I get this error message:

raise error, v # invalid expression
sre_constants.error: nothing to repeat

Can someone please explain?

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
goh
  • 27,631
  • 28
  • 89
  • 151
  • 4
    If anyone gets this error for *no apparent reason*, make sure that the version of Python used when creating your virtualenv still matches the version of the interpreter installed globally (e.g., old vritualenv created before upgrading Python to a newer version.) –  Jan 26 '15 at 18:03
  • @bvukelic How would I readjust so that they're the same? – Dave Liu Jun 12 '15 at 20:56
  • I just destroyed the existing env, and recreated it. –  Jun 15 '15 at 18:59
  • 1
    This is fixed in current version of python and does not throw exception anymore. See [Python Issue18647](https://bugs.python.org/issue18647). – Amir Ali Akbari Jul 19 '15 at 14:07
  • 3
    I had a silly cause of the error where I was matching for a char sequence that began with an asterisk. Escaping the asterisk helped. Check that this is not the issue before concluding that the known Python bug has caused the error. – Kevin Lee Jun 26 '16 at 17:41

6 Answers6

58

It seems to be a python bug (that works perfectly in vim). The source of the problem is the (\s*...)+ bit. Basically , you can't do (\s*)+ which make sense , because you are trying to repeat something which can be null.

>>> re.compile(r"(\s*)+")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/re.py", line 180, in compile
    return _compile(pattern, flags)
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/re.py", line 233, in _compile
    raise error, v # invalid expression
sre_constants.error: nothing to repeat

However (\s*\1) should not be null, but we know it only because we know what's in \1. Apparently python doesn't ... that's weird.

mb14
  • 22,276
  • 7
  • 60
  • 102
  • @alan: yes, I have noticed that as well. – mb14 Sep 09 '10 at 10:21
  • @goh: I guess you need to do it in two steps, first remove all the spaces betweens indenticals stuff and then do your previous stuff ,but you won't need anymore the \s* which causes problems. – mb14 Sep 09 '10 at 11:15
  • Thanks, this helped me figure out a similar issue. For some reason re.compile(mypattern) worked on windows but not linux. Go figure. My issue was I had (.*$)? and had to change it to (.+$)? – Aaron Nov 19 '13 at 23:00
23

That is a Python bug between "*" and special characters.

Instead of

re.compile(r"\w*")

Try:

re.compile(r"[a-zA-Z0-9]*")

It works, however does not make the same regular expression.

This bug seems to have been fixed between 2.7.5 and 2.7.6.

Charlie
  • 8,530
  • 2
  • 55
  • 53
Franklyn
  • 231
  • 2
  • 2
13

It's not only a Python bug with * actually, it can also happen when you pass a string as a part of your regular expression to be compiled, like ;

import re
input_line = "string from any input source"
processed_line= "text to be edited with {}".format(input_line)
target = "text to be searched"
re.search(processed_line, target)

this will cause an error if processed line contained some "(+)" for example, like you can find in chemical formulae, or such chains of characters. the solution is to escape but when you do it on the fly, it can happen that you fail to do it properly...

Ando Jurai
  • 1,003
  • 2
  • 14
  • 29
13

regular expression normally uses * and + in theory of language. I encounter the same bug while executing the line code

re.split("*",text)

to solve it, it needs to include \ before * and +

re.split("\*",text)
Ayoub Arroub
  • 302
  • 2
  • 10
10

Beyond the bug that was discovered and fixed, I'll just note that the error message sre_constants.error: nothing to repeat is a bit confusing. I was trying to use r'?.*' as a pattern, and thought it was complaining for some strange reason about the *, but the problem is actually that ? is a way of saying "repeat zero or one times". So I needed to say r'\?.*'to match a literal ?

nealmcb
  • 12,479
  • 7
  • 66
  • 91
0

I had this problem when using the regex \b?. Using \s? fixed the issue (although it's not the same thing)

robertspierre
  • 3,218
  • 2
  • 31
  • 46