I'm a Python noob. After a few hours of googling, and searching stackoverflow , I failed to find a solution to my problem:
I use an external script to read files containing information about molecule activities. Once read the data will be in a list in the following form:
INACT67481 -10.84
That is, name of the molecule and it's activity value, separated by a single space. The length of the name of the molecule varies greatly.
Now, trouble is, each molecule may have multiple(up to n) values, and only the highest should be preserved, while making sure the order is not changed(beyond removing the duplicates with smaller values).
With the help of threads such as this and this, I know how I could simply delete the duplicates, but am rather lost as to how I could only delete the one with the smallest value, without resorting to a horrible mess of loops.
EDIT: I can also rewrite the file-parsing script in python, if having the data in a different form would prove easier.
EDIT: Sample data:
CHEMBL243059.smi 11.75
CHEMBL115092.smi 10.49
CHEMBL244771.smi 10.79
CHEMBL471221.smi 10.78
CHEMBL573301.smi 10.77
CHEMBL469583.smi 10.77
CHEMBL115092.smi 10.97
CHEMBL244771.smi 8.95
CHEMBL16781.smi 10.76
CHEMBL440776.smi 10.76
CHEMBL243059.smi 10.75
CHEMBL115092.smi 10.69
Should return:
CHEMBL243059.smi 11.75
CHEMBL244771.smi 10.79
CHEMBL471221.smi 10.78
CHEMBL573301.smi 10.77
CHEMBL469583.smi 10.77
CHEMBL115092.smi 10.97
CHEMBL16781.smi 10.76
CHEMBL440776.smi 10.76