0

I need to find starting and ending positions of variable length sequences of chars, consisting of same 1 letter inside a string. I saw this topic Finding multiple occurrences of a string within a string in Python, but I assume it's a bit off.

The following gives me nothing, while I expect to have 5 elements found.

import re
s = 'aaaaabaaaabaaabaaba'
pattern = '(a)\1+'
for el in re.finditer(pattern, s):
    print 'str found', el.start(), el.end()

Thanks in advance.

psb
  • 342
  • 3
  • 12

1 Answers1

-1

Since it is a regex, the backslash should not be escaped at the string level, but should be interpreted by the regex.

You can use a raw string:

import re
s = 'aaaaabaaaabaaabaaba'
pattern = r'(a)\1+'   # raw string
for el in re.finditer(pattern, s):
    print 'str found', el.start(), el.end()

This generates:

str found 0 5
str found 6 10
str found 11 14
str found 15 17
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555