I have an input file,
10N06_64 sc635516 93.93 100.0
10N06_64 sc711028 93.99 100.0
10N06_64 sc255425 93.46 95.8
10N06_64 sc115511 87.5 93.0
116F19_238 sc121016 91.30 12.1
116F19_238 sc1132492 90.94 6.1
116F19_238 sc513573 87.38 6.1
116F19_238 sc68511 75.93 10.5
I need to group and iterate inside each line[0],and print the 3 lines choosing the ones that have highest values in line[3] and line[2] so that my output file looks like this:
10N06_64 sc635516 93.93 100.0
10N06_64 sc711028 93.99 100.0
10N06_64 sc255425 93.46 95.8
116F19_238 sc121016 91.30 12.1
116F19_238 sc68511 75.93 10.5
116F19_238 sc1132492 90.94 6.1
This is my try, but it prints me only one best line, how to modify it to print me 3 best hits?
import csv
from itertools import groupby
from operator import itemgetter
with open('myfile','rb') as f1:
with open('outfile', 'wb') as f2:
reader = csv.reader(f1, delimiter='\t')
writer1 = csv.writer(f2, delimiter='\t')
for group, rows in groupby(reader, itemgetter(0)):
best = max(rows, key=lambda r: (float(r[3]), float(r[2])))
writer1.writerow(best)