0

In DNA sequence blast searches results are generated with following format(in DNA sequence data, the end number in ranges is also included):

blast_hit=['chr1',21,31,'chrx',11,21,'ATC--GGCA-CGAT-','AT-AAGG-ACC--TG'] 
#blast_hit=[query_chr,query_start position,query_end position,subject_chr,subject_start position ,subject_end position,query_sequece,subject_sequence]

I need to retrieve sequence and position of a range for example [24,27] from this result as follow:

query_start=24, query_end=27, query_seq=GGCA
subject_start=15, subject_end=17, subject_seq=GGA (counterpart of GGCA)

for query It is easy to get the results with the code:

wanted_st=24
wanted_en=27
que_st=21
que_en=31
sub_st=11
sub_en=21
que_seq=blast_hit[6]
sub_seq=blast_hit[7]
ref_seq_part= que_seq.replace('-','')[wanted_st-que_st:wanted_en-que_st+1]

But although it seems simple! I can not come up with any solution to obtain result for query. I was wondering if anyone has any solution for this problem.

Masih
  • 920
  • 2
  • 19
  • 36
  • Actually, the counterpart of GGCA is GG-A (you seem to ignore that by using replace). Perhaps you are looking for a way to substring from the sequence: http://stackoverflow.com/questions/509211/explain-pythons-slice-notation – jallmer Dec 21 '15 at 08:38
  • Since this is DNA sequence, '-' means there is nothing in this position that's why it is not shown and the length of ranges are different in query and subject. I can not use substring because of these missing positions represented as dashes – Masih Dec 21 '15 at 08:44

0 Answers0