I don't understand why '(\s*)+'
gives an error 'nothing to repeat'
. At the same time '(\s?)+'
goes just fine.
I've discovered that this problem has been known about quite for some time (for example regex error - nothing to repeat ) but I still see it in Python 3.3.1.
So I am wondering if there is a rational explanation for this behavior.
In reality I want to match a line of repeated words or numbers, for example:
'foo foo foo foo'
I've come up with this:
'(\w+)\s+(\1\s*)+'
It failed because of the second group: (\1\s*)+
In most cases I would probably not have more than 1 space between words so (\1\s?)+
would work. For practical purposes this option also should work (\1\s{0,1000})+
Update: I think I should add that I've seen the problem in python only. In perl it works:
`('foo foo foo foo' =~ /(\w+)\s+(\1\s*)+/) `
Not sure it's equivalent but vim also works:
`\(\<\w\+\>\)\_s\+\(\1\_s*\)\+`
Update2: I found another implementation of regex for python which is said to replace current re someday. I checked and the error doesn't occur for the above problematic cases. This module has to be installed separately. It can be downloaded here or via pypi