I'd like to match all occurances of a substring with python.
I found this, but I'd like to match occurances of a substring separated by at most some distance (for example maximum of 6). So if I have string
AATCGTGCCGTGTGCCCCAAAATGAACGCGCCGCTGTG
I want to get all positions of TG if two TG's are separated by at most 6 characters.
So in the example above I'd like to get [5, 10, 12, 34, 36]
. I don't want the middle TG positions, because it is too far away from either "group" (for 10 characters).
I tried with this:
(?=TG(?:.+){1,6}?)
but it doesn't work.
EDIT
I created regex that returns all the positions I want, except the last ones.
(?=TG.{0,6}TG)
If I use example above, returned positions are marked with |
AATCG|TGCCG|TGTGCCCCAAAATGAACGCGCCGC|TGTG
but I'd like to get also positions marked with \
AATCG|TGCCG|TG\TGCCCCAAAATGAACGCGCCGC|TG\TG
I know why it doesn't work, because it matches all TG followed by 0-6 random characters and one more TG, but I cannot get the idea what should I add to make it work.