I build a code to print strings if a substring exists at a particular section of the main string. I have a file as below and I create 5 alphabet substrings (5mers) from the seq11_rv
.
>seq11_fw
TCAGATGTGTATAAGAGACAGTTATTAGCCGGTTCCAGGTATGCAGTATGAGAA
>seq11_rv
GAGATTATGTGGGAAAGTTCATGGAATCGAGCGGAGATGTGTATAAGAGACAGTGCCGCGCTTCACTAGAAGTCATACTGC
Then I make a reverse-complement of these 5mers and append them to a list. Then I looked into the seq11_fw
and if position [42:51]
(GCAGTATGA in the seq11_fw) has any of items of a list then a confirmation should be printed.
To just make it easy to understand the last 5mer of the seq11_rv
is ACTGC
which its reverse-complement becomes GCAGT
and if you check the seq11_fw[42:51]
this GCAGT
exists inside that location but I do not get any output.
Any help would be appreciated.
here is my code:
from Bio import SeqIO
from Bio.Seq import Seq
with open(file, 'r') as f:
lst = []
for record in SeqIO.parse(f, 'fasta'):
if len(record.seq) == 81:
for i in range(len(record.seq)):
kmer = str(record.seq[i:i + 5])
if len(kmer) == 5:
C_kmer = Seq(kmer).complement()
lst.append(C_kmer[::-1])
cnt=0
if len(record.seq) == 54 and any(str(items) in str(record.seq[42:51]) for items in lst):
cnt +=1
if cnt == 1:
print(record.id)
print(record.seq)
print(lst)