You should think about removing " " from the text first. You can do it by regex itself.
>>> def repetitions(s):
... r = re.compile(r"(.+?)\1+")
... for match in r.finditer(re.sub(r'\s+',"",s)):
... yield (match.group(1), len(match.group(0))/len(match.group(1)))
...
Output.
>>> list(repetitions("a bcab c"))
[('abc', 2)]
If you still want to retain the space in the original text, Try this regex: r"(\s*\S+\s*?\S*?)\1+"
. But this has limitations.
>>> def repetitions(s):
... r = re.compile(r"(\s*\S+\s*?\S*?)\1+")
... for match in r.finditer(s):
... yield (match.group(1), len(match.group(0))/len(match.group(1)))
...
Results:
>>> list(repetitions(" abc abc "))
[(' abc', 2)]
>>> list(repetitions("abc abc "))
[('abc ', 2)]
>>> list(repetitions(" ab c ab c "))
[(' ab c', 2)]
>>> list(repetitions("ab cab c "))
[('ab c', 2)]
>>> list(repetitions("blablabla"))
[('bla', 3)]