In DNA sequence blast searches results are generated with following format(in DNA sequence data, the end number in ranges is also included):
blast_hit=['chr1',21,31,'chrx',11,21,'ATC--GGCA-CGAT-','AT-AAGG-ACC--TG']
#blast_hit=[query_chr,query_start position,query_end position,subject_chr,subject_start position ,subject_end position,query_sequece,subject_sequence]
I need to retrieve sequence and position of a range for example [24,27] from this result as follow:
query_start=24, query_end=27, query_seq=GGCA
subject_start=15, subject_end=17, subject_seq=GGA (counterpart of GGCA)
for query It is easy to get the results with the code:
wanted_st=24
wanted_en=27
que_st=21
que_en=31
sub_st=11
sub_en=21
que_seq=blast_hit[6]
sub_seq=blast_hit[7]
ref_seq_part= que_seq.replace('-','')[wanted_st-que_st:wanted_en-que_st+1]
But although it seems simple! I can not come up with any solution to obtain result for query. I was wondering if anyone has any solution for this problem.