I have a script that returns palindrome substrings in a DNA sequence.
sequence="GATCTCTATACCAACTCAAAATGAAGACTCTTCTTTACACTTTCGAGCTCAGCAGGCTTACCGAGAAGAGTCGTCGTTCACATCCCCCCCTGTGCGAGATCAAGAAATTTGGCGACGTCGGCTTATTATCCTCCGCTGTCAATCAGTTGGACACATCTCTCCGGTCACTGCCGGACAAGCCAACCGAAGATTCGATTCTTCAGCAGCTTATCGACATTGCTGGTGGTGAAAAGCCAAGGCACAGCATCATAGTTGCGACCAATACGTCATACGACCGAGAGACATTGGTAAAGATCCTTCAACGATTCCCATACACCATACCTGGTCTGTCAGATTCAGGCTTGGAATCAGAAACACTCGAGGCTCTTGAGCACATCGCTTTTGCATTAGCCGGGCGATTAGCTCATAGATTTGACTACGGGTTCAATCCAGAGGCCAGTATCGTTCAACACCTCGAGATGTTCACCACCCTTTGGCACCAAAGATCTGCATTACCACCTGCGCCTGCCCCGTATCGACTTCCCGTTCCCGTCAATCAAGGAAGAGTCTCCTCATCAGATGATGGCTCTGATACTGAGTCAGAACTGGATGAAAAATACCACAACATCAAGAAGTCAGGACTTTGGAGGTTTCTGGATATGTTCAAAATGAACTTCAAGAGGTCTTAGATAACGGTCTAGTTCTAGTTCTGCAACTCACACTGA"
print(len(sequence))
pairs = {"A":"T", "T":"A", "G":"C", "C":"G"}
for i in range(len(sequence) - 6 + 1):
pal = True
for j in range(2):
if pairs[ sequence[i+j] ] != sequence[i+5-j]:
pal = False
break
if pal:
print(sequence[i : i+6])
It returns:
704
GATCTC
GAGCTC
GCAGGC
GTTCAC
GAGATC
TCAAGA
AAATTT
GACGTC
CAGTTG
TGGACA
AAGATT
CTTCAG
CCAAGG
CGACCG
TTGGAA
CTCGAG
TCTTGA
CTTGAG
TGAGCA
CGGGCG
ATAGAT
ACGGGT
TCCAGA
CTCGAG
TCGAGA
TGTTCA
GTTCAC
GGCACC
AGATCT
CACCTG
GCCTGC
GACTTC
CAGATG
AGAACT
TCAAGA
GAAGTC
TCAGGA
AGGACT
TCTGGA
TGTTCA
TTCAAA
TCAAGA
GAGGTC
AGGTCT
TAGATA
AGTTCT
AGTTCT
I want to find if these substrings are positioned next to "[ATCG]CC" or "[ATCG]GG" I have in mind to find the position of these palindromes in the sequence (for example from i-th to (i+5)th as palindromes are of length 6) and then check if (i+6)th to (i+8)th letters are [ATCG]CC or [ATCG]GG. Do you know how I can write such script? Or do you have a better logic in mind? Thank you