How can I extend the code below to allow me to explore all instances where I have 2 mismatches or less between my substring and the parent string?
Substring: SSQP
String-to-match-to: SSPQQQQPSSSSQQQSSQPSPSQSSQPSSQPPSSSSQPSPSQSSQPSSSSQPSPSQSSQPSSSSQPSPSQ
Here is an example where only one possible mismatch is incorporated:
>>> s = 'SSPQQQQPSSSSQQQSSQPSPSQSSQPSSQPPSSSSQPSPSQSSQPSSSSQPSPSQSSQPSSSSQPSPSQ'
>>> re.findall(r'(?=(SSQP|[A-Z]SQP|S[A-Z]QP|SS[A-Z]P|SSQ[A-Z]))', s)
['SSQQ', 'SSQP', 'SSQP', 'SSQP', 'SSQP', 'SSQP', 'SSQP', 'SSQP', 'SSQP']
Obviously, incorporating the possibility of two mismatches in the code above would require a lot of brute-force typing of all the possible combinations.
How can I extend this code (or refactor this code) to explore the possibility of two mismatches?
Furthermore, I want to modify my output so that I get the numeric index returned (not SSQQ
or SSQP
) of the exact position the substring matched the string.