Reposting this after receiving down vote, did went back and try something but i guess still not there yet.
File with data which looks like this:
name count count1 count3 add1 add2
jack 70 55 31 100174766 100170715
jack 45 656 48 100174766 100174052
john 41 22 89 102268764 102267805
john 47 31 63 102268764 102267908
david 10 56 78 103361093 103368592
two conditions that i need to check and one math operation which need to be done later: A) which rows/lines have duplicate values in add1 ( always == 2) B) if they are equal to 2, which line/row has a greater value in add2
lets take jack for example:
jack 70 55 31 100174766 100170715
jack 45 656 48 100174766 100174052
jack has two add1 == 2 ( occurs twice) and 100174052
is greater so:
row1 = jack 45 656 48 100174766 100174052
row2 = jack 70 55 31 100174766 100170715
Math:
for each cell between both the rows
row1 /(row1+row2)
output for jack :
jack 0.391304348 0.922644163 0.607594937 100174766 100174052
final desired output
name count count1 count3 add1 add2
jack 0.391304348 0.922644163 0.607594937 100174766 100174052
john 0.534090909 0.58490566 0.414473684 102268764 102267908
code so far:
I know i have not accounted for which add2 is greater not sure where and how to do it
info = []
with open('file.tsv', 'r') as j:
for i,line in enumerate(j):
lines = line.strip().split('\t')
info.append(lines)
uniq = {}
for index,row in enumerate(info, start =1):
if row.count(row[4]) == 2:
key = row[4] + ':' + row[5]
if key not in uniq:
uniq[key] = row[1:3]
for k, v in sorted(uniq.iteritems()):
row1 = k,v
row2 = k,v
print 'row1: ', row1[0], '\n', 'row2: ',row2[0]
all i see is:
row1: 100174766:100170715
row2: 100174766:100170715
row1: 100174766:100174052
row2: 100174766:100174052
instead of
row1: 100174766:100170715
row2: 100174766:100174052