0

Assume the following Python function to remove dashes ("gaps") from a string while maintaining the correct annotations on this string. The input variables instring and annotations constitute a string and a dictionary, respectively.

def DegapButMaintainAnno(instring, annotations):
    degapped_instring = ''
    degapped_annotations = {}
    gaps_cumulative = 0
    for range_name, index_list in annotations.items():
        gaps_within_range = 0
        for pos, char in enumerate(instring):
            if pos in index_list and char == '-':
                index_list.remove(pos)
                gaps_within_range += 1
            if pos in index_list and char != '-':
                degapped_instring += char
                index_list[index_list.index(pos)] = pos - gaps_within_range
        index_list = [i-gaps_cumulative for i in index_list]
        degapped_annotations[range_name] = index_list
        gaps_cumulative += gaps_within_range
    return (degapped_instring, degapped_annotations)

Said function works as expected if none of the ranges specified by the input dictionary overlap:

>>> instr = "A--AT--T"
>>> annot = {"range1":[0,1,2,3,4], "range2":[5,6,7]}
>>> DegapButMaintainAnno(instr, annot)
Out: ('AATT', {'range1': [0, 1, 2], 'range2': [3]})

As soon as one or more of the ranges overlap, however, the code fails:

>>> annot = {"range1":[0,1,2,3,4], "range2":[4,5,6,7]}
>>> DegapButMaintainAnno(instr, annot)
Out: ('AATTT', {'range1': [0, 1, 2], 'range2': [2, 3]}) # See additional 'T' in string

Does someone have a suggestion on how to correct my code for overlapping ranges?

Michael Gruenstaeudl
  • 1,609
  • 1
  • 17
  • 31
  • if you explained the meaning of the annotations it would help a lot... much better than reverse-engineering the algorithm – Pynchia Feb 05 '16 at 21:24
  • @Pynchia Annotations are common in bioinformatics and refer to the idea that different sections (or, technically, ranges) of a string represent indivisible substrings that should be operated on. – Michael Gruenstaeudl Feb 05 '16 at 21:27
  • OK, but can you explain the rationale that makes the annotation for range1 go from `[0,1,2,3,4]` to `[0,1,2]` ? What are the numbers? Indexes in the string? How do they need to work when the dashes are removed? – Pynchia Feb 05 '16 at 21:29
  • @Pynchia You simply remove the dashes from the substring, so that only the letters remain (hence, "removing dashes" in my title). The numbers in each list refer to the index positions in each string. – Michael Gruenstaeudl Feb 05 '16 at 21:32
  • @Pynchia The above Python function has already undergone some scrutiny and discussion, as evidenced here [http://stackoverflow.com/questions/34816513/improving-code-design-of-dna-alignment-degapping]. Hence, it may not be useful to reverse-engineer the algorithm, but rather to add functionality to it. – Michael Gruenstaeudl Feb 05 '16 at 21:49

1 Answers1

0

I think you might be over-thinking things. Here's my attempt:

from copy import copy

def rewriteGene(instr, annos):
    annotations = copy(annos)
    index = instr.find('-')
    while index > -1:
        for key, ls in annotations.items():
            if index in ls:
                ls.remove(index)
            annotations[key] = [e-1 if e > index else e for e in ls]
        instr = instr[:index] + instr[index+1:]
        index = instr.find('-')
    return instr, annotations

instr = "A--AT--T"
annos = {"range1":[0,1,2,3,4], "range2":[4,5,6,7]}

print rewriteGene(instr, annos)
# ('AATT', {'range2': [2, 3], 'range1': [0, 1, 2]})

It should be pretty readable as is, but let me know if you want clarification on anything.

Jared Goguen
  • 8,772
  • 2
  • 18
  • 36
  • Good to see you again, stranger. I remember that you contributed to my previous question on this topic. Your code does seem to solve the issue of overlapping annotations. For example, the input `instr, annos = "AA----TT", {"gene1":[0,1,2,3,4], "gene2":[4,5,6,7]}` correctly removes the overlap in annotations (because only gaps) and results in `('AATT', {'gene1': [0, 1], 'gene2': [2, 3]})`. Thanks for your help! – Michael Gruenstaeudl Feb 05 '16 at 22:29