One text file 'Truth' contains these following values :
0.000000 3.810000 Three
3.810000 3.910923 NNNN
3.910923 5.429000 AAAA
5.429000 7.060000 AAAA
7.060000 8.411000 MMMM
8.411000 8.971000 MMMM
8.971000 13.40600 MMMM
13.40600 13.82700 Zero
13.82700 15.935554 One
Another Text file , 'Test' contains the following values:
0.000000 3.810000 Three
3.810000 3.910923 Three
3.910923 5.429000 AAAA
5.429000 7.060000 Three
7.060000 8.411000 Three
8.411000 8.971000 Zero
8.971000 13.40600 Three
13.40600 13.82700 Zero
13.82700 15.935554 Two
15.935554 20.138337 Two
Now I want to replace the labels in Test with the MMMM
labels from Truth.
The working code that I have so far is:
### Assuming I have already read in both the files into truth and test
res = []
for j in range(len(truth)):
if truth[j][2]== 'MMMM' and truth[j][0]==test[j][0] and truth[j][1]==test[j][1]:
res.append((test[j][0], test[j][1],truth[j][2]))
else:
res.append((test[j][0], test[j][1],test[j][2]))
for i in range(len(res)):
print res[i]
My code is ugly but works fine as long as the ranges match well. However I'm unsure how to proceed in case the truth file is much longer than the test file i.e there are more number of intervals and labels.
Ex my truth file could be like this:
0.000000 1.00000 MMMM
1.000 3.810000 Three
3.810000 3.910923 NNNN
3.910923 5.429000 AAAA
5.429000 6.0000 MMMM
6.0000 7.060000 AAAA
7.060000 8.411000 MMMM
8.411000 8.971000 MMMM
8.971000 11.00 abcd
11.00 13.40600 MMMM
13.40600 13.82700 Zero
13.82700 15.935554 One
In such a scenario how do I accurately carry on with the updating/replacements of labels, with minimal lost of data?
In other words, how should I create some condition metric like 80 %age overlap for replacement of a label with MMMM at a given time range? Please advise. thank you