I am trying to bin a long string into short strings using a while
loop in Python. I get unexpected behavior when I use a floating point number in the while
statement. The following code is expected to return 20 bins of 5 letters each:
sequence = 'TTAGAGGAAATATGAAAACCCTAGAATCGGAAGAAAACTATATATGTATATCTTTCCGTTGACTTTATATAGAATGAAATCAAGGAAAGAAAAGAGCTAA'
seqLength = len(sequence)
bins = 20
binLength = int(round(seqLength / bins))
leftFrame = 0
rightFrame = 0.05
subSeqs = []
while rightFrame <= 1.0:
subSeq = sequence[int(round(leftFrame * seqLength)) : int(round(rightFrame * seqLength))]
subSeqs.append(subSeq)
leftFrame, rightFrame = rightFrame, rightFrame + 0.05
print(f"The sequence is {seqLength} nucleotides. Each bin is {binLength} nt long.")
print(subSeqs, "\n", len(subSeqs))
This returns:
The sequence is 100 nucleotides. Each bin is 5 nt long.
['TTAGA', 'GGAAA', 'TATGA', 'AAACC', 'CTAGA', 'ATCGG', 'AAGAA', 'AACTA', 'TATAT', 'GTATA', 'TCTTT', 'CCGTT', 'GACTT', 'TATAT', 'AGAAT', 'GAAAT', 'CAAGG', 'AAAGA', 'AAAGA']
19
However, replacing line 9 with while rightFrame <= 1.01:
returns the correct output:
The sequence is 100 nucleotides. Each bin is 5 nt long.
['TTAGA', 'GGAAA', 'TATGA', 'AAACC', 'CTAGA', 'ATCGG', 'AAGAA', 'AACTA', 'TATAT', 'GTATA', 'TCTTT', 'CCGTT', 'GACTT', 'TATAT', 'AGAAT', 'GAAAT', 'CAAGG', 'AAAGA', 'AAAGA', 'GCTAA']
20
Adding up to 14 zeros (1.000000000000001) is fine, but a 15th zero breaks it again. How should I avoid this issue? Thanks very much.