I have two files representing records with intervals.
file1.txt
a 5 10
a 13 19
a 27 39
b 4 9
b 15 19
c 20 33
c 39 45
and
file2.txt
something id1 a 4 9 commentx
something id2 a 14 18 commenty
something id3 a 1 4 commentz
something id5 b 3 9 commentbla
something id6 b 16 18 commentbla
something id7 b 25 29 commentblabla
something id8 c 5 59 hihi
something id9 c 40 45 hoho
something id10 c 32 43 haha
What I would like to do is to make a file representing only records of the file2 for which, if the column 3 of the file2 is identical to the column 1 of the file1, the range (column 4 and 5) is not overlapping with that of the file1 (column 2 and 3).
The expected output file should be in a file
test.result
something id3 a 1 4 commentz
something id7 b 25 29 commentblabla
I have tried to use the following python code:
import csv
with open ('file2') as protein, open('file1') as position, open ('test.result',"r+") as fallout:
writer = csv.writer(fallout, delimiter=' ')
for rowinprot in csv.reader(protein, delimiter=' '):
for rowinpos in csv.reader(position, delimiter=' '):
if rowinprot[2]==rowinpos[0]:
if rowinprot[4]<rowinpos[1] or rowinprot[3]>rowinpos[2]:
writer.writerow(rowinprot)
This did not seem to work...I had the following result:
something id1 a 4 9 commentx
something id1 a 4 9 commentx
something id1 a 4 9 commentx
which apparently is not what I want.
What did I do wrong? It seems to be in the conditional loops. Still, I couldn't figure it out...