Here's a solution that
- uses a regexp to scan all occurrences of an uppercase letter followed by a number in brackets
- transforms the string pairs from the regexp with a generator expression into (value,key) tuples
- returns the key from the tuple that has the highest value
I also added a main function so that the script can be used as a command line tool to read all lines from one file and the write the key with the highest value for each line to an output file. The program uses iterators, so that it is memory efficient no matter how large the input file is.
import re
KEYVAL = re.compile(r"([A-Z])\s*\((\d+)\)")
def max_item(row):
return max((int(v),k) for k,v in KEYVAL.findall(row))[1]
def max_item_lines(fh):
for row in fh:
yield "%s\n" % max_item(row)
def process_file(infilename, outfilename):
infile = open(infilename)
max_items = max_item_lines(infile)
outfile = open(outfilename, "w")
outfile.writelines(max_items)
outfile.close()
if __name__ == '__main__':
import sys
infilename, outfilename = sys.argv[1:]
process_file(infilename, outfilename)
For a single row, you can call:
>>> max_item("A (8) C (4) G (48419) T (2)")
'G'
And to process a complete file:
>>> process_file("inputfile.txt", "outputfile.txt")
If you want an actual Python list of every row's maximum value, then you can use:
>>> map(max_item, open("inputfile.txt"))