0

I'm doing a problem on rosalind that wants you to return the positions that a substring occurs in a longer string. The only problem is there is an overlapping occurrence and the output should be: 1, 3, 9 (assuming 0 based counting) but I'm only getting 1 and 9? Here's my code.

import re

s='GATATATGCATATACTT'
t='ATAT'

substrings=re.compile('ATAT')
matches=substrings.finditer(s)

for match in matches:
     print(match.start()+1)  #doesn't find overlapping ones

Any help would be appreciated, thanks!

3 Answers3

2

If you can install a third-party module, the regex module has an extended version of the re module API that allows an overlapped=True argument to be passed to findall and finditer.

https://pypi.python.org/pypi/regex

Otherwise, you might be able to adapt this answer.

Community
  • 1
  • 1
kindall
  • 178,883
  • 35
  • 278
  • 309
1

You need to use lookahead.

import re
s='GATATATGCATATACTT'
t='ATAT'
print([match.start() for match in re.finditer('(?=%s)' % t, s)])

Output:

[1, 3, 9]
Pythonista
  • 11,377
  • 2
  • 31
  • 50
  • Could you actually explain what a lookahead is? I'm a python beginner (a programming beginner actually) and haven't ever heard the phrase? How does it work? – pythonbeginner2506 Apr 25 '16 at 18:42
  • I think this might help http://stackoverflow.com/questions/2973436/regex-lookahead-lookbehind-and-atomic-groups and this http://www.rexegg.com/regex-lookarounds.html. – Pythonista Apr 25 '16 at 18:45
1

A 10 second search revealed this.

You basically have to surround your RegEx with "(?=" and ")". This is a positive lookahead, resulting in the RegEx that doesn't block parts of the string for future matches.

Be sure to capture group 1.

I hope I could help,

CodenameLambda

Community
  • 1
  • 1
CodenameLambda
  • 1,486
  • 11
  • 23