2

I'm facing problem with this. I have 10,000 rows in my dictionary and this is one of the rows

Example: A (8) C (4) G (48419) T (2) when printed out

I'd like to get 'G' as an answer, since it has the highest value.

I'm currently using Python 2.4 and I have no idea how to solve this as I'm quite new in Python.

Thanks a lot for any help given :)

bluish
  • 26,356
  • 27
  • 122
  • 180
Vincent
  • 33
  • 1
  • 1
  • 5
  • Give us two or three rows from your dictionary and respective output awaited. – eumiro Feb 07 '11 at 09:44
  • Duplicate of http://stackoverflow.com/questions/268272/getting-key-with-maximum-value-in-dictionary – bluish Feb 07 '11 at 09:47
  • Why are you using a release from 2005? –  Feb 07 '11 at 10:04
  • I submitted an answer to get the highest value from a single row, but is your actual use-case to get the highest value in the whole 10,000 row file? Or to get a list of highest values for every row? – shang Feb 07 '11 at 13:28

5 Answers5

3

Here's a solution that

  1. uses a regexp to scan all occurrences of an uppercase letter followed by a number in brackets
  2. transforms the string pairs from the regexp with a generator expression into (value,key) tuples
  3. returns the key from the tuple that has the highest value

I also added a main function so that the script can be used as a command line tool to read all lines from one file and the write the key with the highest value for each line to an output file. The program uses iterators, so that it is memory efficient no matter how large the input file is.

import re
KEYVAL = re.compile(r"([A-Z])\s*\((\d+)\)")

def max_item(row):
    return max((int(v),k) for k,v in KEYVAL.findall(row))[1]

def max_item_lines(fh):
    for row in fh:
        yield "%s\n" % max_item(row)

def process_file(infilename, outfilename):
    infile = open(infilename)
    max_items = max_item_lines(infile)
    outfile = open(outfilename, "w")
    outfile.writelines(max_items)
    outfile.close()

if __name__ == '__main__':
    import sys
    infilename, outfilename = sys.argv[1:]
    process_file(infilename, outfilename)

For a single row, you can call:

>>> max_item("A (8) C (4) G (48419) T (2)")
'G'

And to process a complete file:

>>> process_file("inputfile.txt", "outputfile.txt")

If you want an actual Python list of every row's maximum value, then you can use:

>>> map(max_item, open("inputfile.txt"))
shang
  • 24,642
  • 3
  • 58
  • 86
1
max(d.itervalues())

This will be much faster than say d.values() as it is using an iterable.

Iacks
  • 3,757
  • 2
  • 21
  • 24
1

Try the following:

st = "A (8) C (4) G (48419) T (2)" # your start string
a=st.split(")")
b=[x.replace("(","").strip() for x in a if x!=""]
c=[x.split(" ") for x in b]
d=[(int(x[1]),x[0]) for x in c]
max(d) # this is your result.
phimuemue
  • 34,669
  • 9
  • 84
  • 115
0

Use regular expressions to split the line. Then for all the matched groups, you have to convert the matched strings to numbers, get the maximum, and figure out the corresponding letter.

import re
r = re.compile('A \((\d+)\) C \((\d+)\) G \((\d+)\) T \((\d+)\)')
for line in my_file:
  m = r.match(line)
  if not m:
    continue # or complain about invalid line
  value, n = max((int(value), n) for (n, value) in enumerate(m.groups()))
  print "ACGT"[n], value
DS.
  • 22,632
  • 6
  • 47
  • 54
  • Sorry, this is Python2.6. You can do this in Python2.4, but you'll probably need square brackets inside max, i.e. max([...]). – DS. Feb 07 '11 at 09:51
  • In Python2.6 the following works and returns `9`: `max(i for i in xrange(10))` – eumiro Feb 07 '11 at 09:59
  • 2.4 is old and not so shiny, and it lacks most of Python's more recent awesomeness. But it does have generator expressions. –  Feb 07 '11 at 10:03
  • Thanks, I've used regular expression in the earlier part and it works fine. Can I use this to find out the highest value from A(8)C(4)G(48419)T(2) from operator import itemgetter d = a_value, c_value, g_value, t_value sorted(d, key=itemgetter(0)) print d – Vincent Feb 07 '11 at 10:12
  • itemgetter won't help you this way: first you need to pair the number with the value you want to return (i.e. A, C, G, T). Other answers achieve the same with reversing and sorting tuples. – DS. Feb 07 '11 at 13:11
0
row = "A (8) C (4) G (48419) T (2)"

lst = row.replace("(",'').replace(")",'').split() # ['A', '8', 'C', '4', 'G', '48419', 'T', '2']

dd = dict(zip(lst[0::2],map(int,lst[1::2]))) # {'A': 8, 'C': 4, 'T': 2, 'G': 48419} 

max(map(lambda k:[dd[k],k], dd))[1] # 'G'
pyanon
  • 1,065
  • 6
  • 3