I would like to make a sliding window in python that examines a sequence of DNA (that is going to be anywhere between 2000 to 4000 base pairs long) in frames of 120 base pairs. However, I also want to take into account about 20 the nucleotides flanking the up and downstream regions of the 120 base pair frame. However, if the sliding window is moves to position 14 or position 1992 in a 2000 base pair long DNA sequence, for example, then obviously either the upstream or downstream flanking regions are going to have to be less than 20 base pairs long.
So far, I have designed my code like so:
import from Bio import SeqIO
from Bio.Alphabet.IUPAC import IUPACUnambiguousDNA
fasta= SeqIO.to_dict(SeqIO.parse("RD4.fasta", "fasta", alphabet=IUPACUnambiguousDNA()))
sequence= DNA_sequence.values()[0].seq
print(sequence)
sequence= "TGTGAATTCATACAAGCCGTAGTCGTGCAGAAGCGCAACACTCTTGGAGTGGCCTACAACGGCGCTCTCCGCGGCGCGGGCGTACCGGATATCTTAGCTGGTCAATAGCCATTTTTCAGCAATTTCTCAGTAACGCTACGGG"
target_length= 120
for position in range(len(sequence)-target_length+1):
stop= position+target_length
potential_target_frame= sequence[position:stop]
potential_target_frame= str(potential_target)
if position < 20:
upstream_flank= sequence[:position]
downstream_flank= sequence[stop:stop+20]
elif len(sequence) - stop < 20:
upstream_flank= sequence[position-20:position]
downstream_flank= sequence[stop:]
else:
upstream_flank= sequence[position-20:position]
downstream_flank= sequence[stop:stop+20]
print("upstream flank is " + upstream_flank)
print("downstream flank is " + downstream_flank)
While this code is ostensibly designed logically, the print functions show that there is a problem for how this code is designed– only the downstream flank is printed, not the upstream flank.
Is there a problem with how my conditional tree has been set up, or is the problem with how I am slicing my original sequence?