9

I've discovered something that I can't explain in Python re module. Compilation of (a*)* or (a*|b)* throws an error:

raise error, v # invalid expression sre_constants.error: nothing to repeat

I've tested this regexp in javascript and it seems to be ok.

Is it a bug?

f0b0s
  • 2,978
  • 26
  • 30
  • 2
    Related: http://stackoverflow.com/questions/3675144/regex-error-nothing-to-repeat – Kobi May 03 '11 at 12:49
  • 1
    I would add: logically, none of these makes sense. `(a*)*` is the same as `a*`, and `(a*|b)*` is the same as `[ab]*` (or `(a|b)*`). Is there a good use case, out of curiosity? – Kobi May 03 '11 at 13:00
  • @Kobi maybe if it's a greedy match, whereby you match as long a string as possible but fewer instances - but for this you'd want `+` not `*` – theheadofabroom May 03 '11 at 13:14

5 Answers5

9

Yes, it's a bug (or at least a misfeature). It's complaining that if a* matches nothing, it doesn't know how to capture 0 or more "nothings".

Mu Mind
  • 10,935
  • 4
  • 38
  • 69
5

A bug in Python.

http://bugs.python.org/issue2537

http://bugs.python.org/issue214033

Maybe a "bug" is not the correct word here. Different kind of interpretation...

lzap
  • 16,417
  • 12
  • 71
  • 108
5

a* can be null, giving (null)* which makes no sense to the interpreter. (a*|b) can also be null as it can evaluate to either (b) or (a*).you could use (a+)* and therefore (a+|b)*

theheadofabroom
  • 20,639
  • 5
  • 33
  • 65
1

There is actually an important reason for Python to reject (a*)*, and (a*|b)*. Since * is greedy, it matches the longest string it can. The problem is that if the regex modified by * is empty, then the regular expression parser tries to match as many repetitions of empty string as possible. This means that it would match any number of empty strings in between any two characters in the string you test against. Since a* is in a capturing group, it would have to capture all of those empty strings, which would be impossible.

murgatroid99
  • 19,007
  • 10
  • 60
  • 95
0

This seems to be a python issue, see this http://bugs.python.org/issue214033

Also it is on StackO as well regex error - nothing to repeat

Community
  • 1
  • 1
bpgergo
  • 15,669
  • 5
  • 44
  • 68