0

I am trying to bin a long string into short strings using a while loop in Python. I get unexpected behavior when I use a floating point number in the while statement. The following code is expected to return 20 bins of 5 letters each:

sequence = 'TTAGAGGAAATATGAAAACCCTAGAATCGGAAGAAAACTATATATGTATATCTTTCCGTTGACTTTATATAGAATGAAATCAAGGAAAGAAAAGAGCTAA'
seqLength = len(sequence)
bins = 20
binLength = int(round(seqLength / bins))

leftFrame = 0
rightFrame = 0.05
subSeqs = []
while rightFrame <= 1.0:
    subSeq = sequence[int(round(leftFrame * seqLength)) : int(round(rightFrame * seqLength))]
    subSeqs.append(subSeq)
    leftFrame, rightFrame = rightFrame, rightFrame + 0.05

print(f"The sequence is {seqLength} nucleotides. Each bin is {binLength} nt long.")
print(subSeqs, "\n", len(subSeqs))

This returns:

The sequence is 100 nucleotides. Each bin is 5 nt long.
['TTAGA', 'GGAAA', 'TATGA', 'AAACC', 'CTAGA', 'ATCGG', 'AAGAA', 'AACTA', 'TATAT', 'GTATA', 'TCTTT', 'CCGTT', 'GACTT', 'TATAT', 'AGAAT', 'GAAAT', 'CAAGG', 'AAAGA', 'AAAGA'] 
 19

However, replacing line 9 with while rightFrame <= 1.01: returns the correct output:

The sequence is 100 nucleotides. Each bin is 5 nt long.
['TTAGA', 'GGAAA', 'TATGA', 'AAACC', 'CTAGA', 'ATCGG', 'AAGAA', 'AACTA', 'TATAT', 'GTATA', 'TCTTT', 'CCGTT', 'GACTT', 'TATAT', 'AGAAT', 'GAAAT', 'CAAGG', 'AAAGA', 'AAAGA', 'GCTAA'] 
 20

Adding up to 14 zeros (1.000000000000001) is fine, but a 15th zero breaks it again. How should I avoid this issue? Thanks very much.

  • Just putting this out there: `["".join(s) for s in zip(*[sequence[i::5] for i in range(5)])]`. You can replace the 5 with the bin lengths. – Eric B May 23 '22 at 23:42
  • How many times should the loop run? Build the loop to make it do that. Next: each time through the loop, what should the values be? Use math to get them. – Karl Knechtel May 23 '22 at 23:44
  • Alternately: instead of trying to have a number that represents `x / 20` and multiplying by it, design it so that you multiply by x first and *then* divide by 20, rounding as appropriate. – Karl Knechtel May 23 '22 at 23:44

0 Answers0