My current code:
import re
from Bio.Seq import Seq
def check_promoter(binding_element,promoter_seq):
promoter_seq = str(promoter_seq)
residues = list()
for i in range(0,len(promoter_seq)):
if binding_element[0] == promoter_seq[i]:
ind = promoter_seq[i]
for j in range(0,len(binding_element)):
if binding_element[0+j] == promoter_seq[i+j-len(binding_element)]:
residues.append(i+j-len(binding_element))
return residues
ESR1_promoter = Seq('''aagtcaggctgagagaatctcagaaggttgtggaagggtctatctacttt\
gggagcattttgcagaggaagaaactgaggtcctggcaggttgcattctc\
ctgatggcaaaatgcagctcttcctatatgtataccctgaatctccgccc\
ccttcccctcagatgccccctgtcagttcccccagctgctaaatatagct\
gtctgtggctggctgcgtatgcaaccgcacaccccattctatctgcccta\
tctcggttacagtgtagtcctccccagggtcatcctatgtacacactacg\
tatttctagccaacgaggagggggaatcaaacagaaagagagacaaacag\
agatatatcggagtctggcacggggcacataaggcagcacattagagaaa\
gccggcccctggatccgtctttcgcgtttattttaagcccagtcttccct\
gggccacctttagcagatcctcgtgcgcccccgccccctggccgtgaaac\
tcagcctctatccagcagcgacgacaagtaaagtaaagttcagggaagct\
gctctttgggatcgctccaaatcgagttgtgcctggagtgatgtttaagc\
caatgtcagggcaaggcaacagtccctggccgtcctccagcacctttgta\
atgcatatgagctcgggagaccagtacttaaagttggaggcccgggagcc\
caggagctggcggagggcgttcgtcctgggactgcacttgctcccgtcgg\
gtcgcccggcttcaccggacccgcaggctcccggggcagggccggggcca\
gagctcgcgtgtcggcgggacatgcgctgcgtcgcctctaacctcgggct\
gtgctctttttccaggtggcccgccggtttctgagccttctgccctgcgg\
ggacacggtctgcaccctgcccgcggccacggaccatgaccatgaccctc\
cacaccaaagcatctgggatggccctactgcatcagatccaagggaacga''')
ESR1_complement = ESR1_promoter.complement()
SBE = 'CAGACA'
print check_promoter(SBE,ESR1_promoter)
print check_promoter(SBE,ESR1_complement)
This code works when I test with the string 'aa' and returns a list of the index where 'aa' was found but when I test with other sequences (i.e. 'tcc') it finds no matches when clearly there is a 'tcc' in the sequence. Further, the string 'CAGACA' was identified with the re.findall method in the complement string, but this does not provide an index.
Can anybody suggest what I'm doing wrong?
Also, a secondary problem - as you can see I have cheated a little bit since my code will only check the first:
promoter_seq[i+j-len(binding_element)]
elements since I get an index error. Does anybody know a way around this?
Thanks