Regex to find matched string position with start() and end() using lookaround

Asked Aug 31 '17 at 06:09

Active Aug 31 '17 at 06:14

Viewed 16 times

I have my string like aaadaa and want to search aa in it and return their respective index positions like (0, 1)(1, 2)(4, 5) with start() and end() functions.

import re
sequence = "aaadaa"
query = "aa"
r = re.compile(query)
print([[m.start(),m.end()] for m in r.finditer(sequence)])

It gives me the below output

[[0, 2], [4, 6]]

Obviously i can see from here it invlove the use of lookarounds here can someone give me the regular expresssion or the solution to find such positions.

edited Aug 31 '17 at 06:14

asked Aug 31 '17 at 06:09

user7422128

`query = "(?=(aa))"` and then use `m.start(1)` and `m.end(1)`. [Demo](https://ideone.com/ExxEfI). – Wiktor Stribiżew Aug 31 '17 at 06:10
it gives this output [[0, 2], [1, 3], [4, 6]] and it's not duplicate – user7422128 Aug 31 '17 at 06:13
1

As for the pattern, it is a dupe since what you need is *overlapping matches*. As for the code, you may subtract 1 from the `end(1)` - [`[m.start(1),m.end(1)-1]`](https://ideone.com/ExxEfI). – Wiktor Stribiżew Aug 31 '17 at 06:15
i got the ans but why we subtracting -1 to m.end(1). it starts with index 0 but still have 1+ for end. isn't strange?? – user7422128 Aug 31 '17 at 06:21
Look, it is reasonable to say that `aa` end index in `aad` is 2 because there is the second `a` at Index 1. Index 1 is not the end, it still is in range of the match value. The match value occupies Index 0 and Index 1, and the char at Index 2 is already out of the matched value range - it is the end. I think this is the idea behind `.end()` – Wiktor Stribiżew Aug 31 '17 at 07:01

Regex to find matched string position with start() and end() using lookaround

0 Answers0