I have an input file having 15 columns,
con13 tr|M0VCZ1| 91.39 267 23 0 131 211 1 267 1 480 239 267 33.4 99.6
con13 tr|M8B287| 97.12 590 17 0 344 211 1 267 0 104 239 590 74.0 99.8
con15 tr|M0WV77| 92.57 148 11 0 73 516 1 148 2 248 256 148 17.3 99.3
con15 tr|C5WNQ0| 85.14 148 22 0 73 516 1 178 4 233 256 148 17.3 99.3
con15 tr|B8AQC2| 83.78 148 24 0 73 516 1 148 6 233 256 148 17.3 99.3
con18 tr|G9HXG9| 99.66 293 1 0 144 102 1 293 7 527 139 301 63.1 97.0
con18 tr|M0XCZ0| 98.29 293 5 0 144 102 1 293 2 519 139 301 63.1 97.0
I need to 1) group and iterate inside each con (using groupby), 2) sort line[2] from lowest to highest value, 3) see inside each group if line[0], line[8] and line[9] are similar, 4) if they are similar, remove repetitive elements and print the results in a new .txt file choosing the one that has highest value in line[2], so that my output file looks like this,
con13 tr|M8B287| 97.12 590 17 0 344 211 1 267 0 104 239 590 74.0 99.8
con15 tr|M0WV77| 92.57 148 11 0 73 516 1 148 2 248 256 148 17.3 99.3
con15 tr|C5WNQ0| 85.14 148 22 0 73 516 1 178 4 233 256 148 17.3 99.3
con18 tr|G9HXG9| 99.66 293 1 0 144 102 1 293 7 527 139 301 63.1 97.0
My attempted script, prints only one single con and does not sort,
from itertools import groupby
f1 = open('example.txt','r')
f2 = open('result1', 'w')
f3 = open('result2.txt','w')
for k, g in groupby(f1, key=lambda x:x.split()[0]):
seen = set()
for line in g:
hsp = tuple(line.rsplit())
if hsp[8] and hsp[9] not in seen:
seen.add(hsp)
f2.write(line.rstrip() + '\n')
else:
f3.write(line.rstrip() + '\n')