retrieving ranges from query and subject from blast output

Question

In DNA sequence blast searches results are generated with following format(in DNA sequence data, the end number in ranges is also included):

blast_hit=['chr1',21,31,'chrx',11,21,'ATC--GGCA-CGAT-','AT-AAGG-ACC--TG'] 
#blast_hit=[query_chr,query_start position,query_end position,subject_chr,subject_start position ,subject_end position,query_sequece,subject_sequence]

I need to retrieve sequence and position of a range for example [24,27] from this result as follow:

query_start=24, query_end=27, query_seq=GGCA
subject_start=15, subject_end=17, subject_seq=GGA (counterpart of GGCA)

for query It is easy to get the results with the code:

wanted_st=24
wanted_en=27
que_st=21
que_en=31
sub_st=11
sub_en=21
que_seq=blast_hit[6]
sub_seq=blast_hit[7]
ref_seq_part= que_seq.replace('-','')[wanted_st-que_st:wanted_en-que_st+1]

But although it seems simple! I can not come up with any solution to obtain result for query. I was wondering if anyone has any solution for this problem.

Actually, the counterpart of GGCA is GG-A (you seem to ignore that by using replace). Perhaps you are looking for a way to substring from the sequence: http://stackoverflow.com/questions/509211/explain-pythons-slice-notation — jallmer, Dec 21 '15 at 08:38
Since this is DNA sequence, '-' means there is nothing in this position that's why it is not shown and the length of ranges are different in query and subject. I can not use substring because of these missing positions represented as dashes — Masih, Dec 21 '15 at 08:44

retrieving ranges from query and subject from blast output

0 Answers0