3

I am using the following to get all matches including overlapping as per recommendations on other threads:

[(m.start(0), m.end(0)) for m in re.findall(t,s,overlapped = True)]

where t is a subset of s. However, I get the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: findall() got an unexpected keyword argument 'overlapped.'

What am I doing wrong/is overlapped an outdated flag/how would you do it? All help is much appreciated.

Community
  • 1
  • 1
user1764359
  • 97
  • 10

1 Answers1

3

As mentioned by Cunningham and Klaus, the flag I'm referring to comes from a different package that is not re.

I figured out a solution without downloading an external package, though, using lookahead:

[(m.start(0), m.end(0)) for m in re.finditer('(?='+t+')',s)]

When s = 'GATATATGCATATACTT' and t = 'ATAT', you get [(1, 1), (3, 3), (9, 9)]. I don't need to return the text in the match, just the indices, so it doesn't matter if it matches ['','',''].

user1764359
  • 97
  • 10
  • Regex can be tailored to do different overlapping. Do you understand how it works ? –  Jul 27 '15 at 17:56
  • do you mean the import package regex or re, which I am using here? I understand how to edit packages, yes. – user1764359 Jul 27 '15 at 18:01
  • Not the package. I mean why this `(?=ATAT)` gets overlaps. –  Jul 27 '15 at 18:08
  • Do you really want `[(1, 1), (3, 3), (9, 9)]` as the output? – Padraic Cunningham Jul 27 '15 at 18:11
  • @sln It works because the expression actually matches the '', each of which are non-overlapping. – user1764359 Jul 27 '15 at 18:16
  • no, @PadraicCunningham in my code i tossed the `m.end(0)` and used `' '.join(comprehension)` – user1764359 Jul 27 '15 at 18:18
  • @sln for further clarification it doesn't actually match what's inside the (?=pattern) because that's the lookahead function in re. – user1764359 Jul 27 '15 at 18:38
  • 1
    Actually it does match `(?=pattern)`, it just doesn't consume it. So you wouldn't think the match position changes between matches, but it does, it advances 1 character position before the next match starts. This is so there is not an infinite loop.Its called _bump along_. I think all engines do this. A safer approach is to force it along via `(?=pattern).` –  Jul 27 '15 at 18:49
  • @sln i think my understanding was there, but that's certainly a better way to describe what's going on. thanks for the in-depth explanation. – user1764359 Jul 27 '15 at 19:09