Assume the following Python function to remove dashes ("gaps") from a string while maintaining the correct annotations on this string. The input variables instring and annotations constitute a string and a dictionary, respectively.
def DegapButMaintainAnno(instring, annotations):
degapped_instring = ''
degapped_annotations = {}
gaps_cumulative = 0
for range_name, index_list in annotations.items():
gaps_within_range = 0
for pos, char in enumerate(instring):
if pos in index_list and char == '-':
index_list.remove(pos)
gaps_within_range += 1
if pos in index_list and char != '-':
degapped_instring += char
index_list[index_list.index(pos)] = pos - gaps_within_range
index_list = [i-gaps_cumulative for i in index_list]
degapped_annotations[range_name] = index_list
gaps_cumulative += gaps_within_range
return (degapped_instring, degapped_annotations)
Said function works as expected if none of the ranges specified by the input dictionary overlap:
>>> instr = "A--AT--T"
>>> annot = {"range1":[0,1,2,3,4], "range2":[5,6,7]}
>>> DegapButMaintainAnno(instr, annot)
Out: ('AATT', {'range1': [0, 1, 2], 'range2': [3]})
As soon as one or more of the ranges overlap, however, the code fails:
>>> annot = {"range1":[0,1,2,3,4], "range2":[4,5,6,7]}
>>> DegapButMaintainAnno(instr, annot)
Out: ('AATTT', {'range1': [0, 1, 2], 'range2': [2, 3]}) # See additional 'T' in string
Does someone have a suggestion on how to correct my code for overlapping ranges?