The following code is a program aimed to count the frequency within the unequal length intervals in a large data set. The two list "snp" and "bin_list" are test data. And I have to program my code as the following show.
I have a problem that the results were different when used "continue" and "snp.remove(site)" in the codes.
When using "continue" in the codes, I got the following results:
Potri.001G000300up1k 26
Potri.001G000400down1k 26
Potri.001G000300part2 5
However I got the different results when using "snp.remove(site)" in the codes:
Potri.001G000300up1k 26
Potri.001G000400down1k 25
Potri.001G000300part2 5
Actually, the 1st results were right with low speed while the 2nd results were a bit of wrong with high speed.
So, my question is that how can I fix the bug when using "snp.remove(site)" in the codes?
And I use the python 2.7.12.
Note: I have to iterate over the list "snp" every loop.
#!/usr/bin/env python
def locateBin(Start, End, site):
return site >= Start and site <= End
snp = ['17', '24', '31', '36', '38', '43', '45', '50', '52', '58', '86', '224', '306', '369', '663', '665', '668', '740', '811', '844', '891', '942', '1059', '1097', '1186', '1371', '1437', '1458', '1487', '1537', '1571', '1720', '1853', '2066', '2238', '2292', '2296', '2332', '2367', '2387', '2483', '2585', '2772', '2856', '2935', '2944', '2966', '2967', '2991', '2992', '3048', '3166', '3211', '3241', '3280', '3350', '3351', '3367', '3373', '3378', '3406', '3449', '3454', '3533', '3573', '3621', '3623', '3643', '3644', '3697', '3745', '3757', '3822', '3867', '3893', '3949', '4094', '4142', '4149', '4260', '4457', '4462', '4511', '4528', '4535', '4622', '4719', '4722', '4775', '4790', '4801', '4863', '4873', '4879', '4928', '5044', '5454', '5498', '5557', '5584', '5805', '6215', '6231', '6243', '6293', '6346', '6365', '6401', '6421', '6616', '6812', '6861', '6925', '7023', '7126', '7341', '7342', '7369', '7412', '7413', '7483', '7501', '7645', '7679', '7681', '7799', '7828', '7896', '7928', '7944', '7950', '7971', '8002', '8003', '8038', '8058', '8092', '8134', '8213', '8224', '8275', '8292', '8323', '8378', '8444', '8481', '8498', '8499', '8504', '8556', '8616', '8660', '8676', '8710', '8773', '8817', '9158', '9228', '9232', '9302', '9321', '9340', '9383', '9429', '9538', '9602', '9691', '9723', '9880', '9914', '10044', '10046', '10068', '10073', '10176', '10192', '10237', '10241', '10300', '10368', '10618', '10742', '10835', '10959', '11025', '11028', '11260', '11275', '11528', '11912', '11986', '12062', '12095', '12347', '12366', '12513', '12560', '12592', '12648']
bin_list = [['Potri.001G000300up1k', 'Chr01', '7391', '8391'], ['Potri.001G000400down1k', 'Chr01', '7391', '8391'], ['Potri.001G000300part2', 'Chr01', '8625', '8860']]
index = 0
count_list = []
while index < len(bin_list):
num = 0
el = bin_list[index]
for site in snp:
if int(site) < int(el[2]):
continue
#snp.remove(site)
elif locateBin(int(el[2]), int(el[3]), int(site)):
num += 1
else:
count_list.append([el[0], num])
break
index += 1
for line in count_list:
print("%s\t%s\n" % (line[0], line[1])),