I'm trying to match all GY
or YG
combinations in my string QGYGQGYQQG
using the re
package in python. I place all these matches in a dict for future look-up.
The problem I run into is when Y
is flanked either side by G
: Basically my regex can't capture both GY
and YG
in GYG
properly.
This is my code so far:
import re
seq = 'QYGQGYGQQG'
regex = re.compile('(GY|YG)|(?<=Y)G')
iterator = regex.finditer(seq)
dd = {}
for matchedobj in iterator:
dd[matchedobj.group()] = dd.get(matchedobj.group(), []) + [matchedobj.start()]
Output:
{'G': [6], 'GY': [4], 'YG': [1]}