0

By using iterations and comprehensions, I created a python dictionary that stores keys associated with values (stats related to certain features). The keys, refer to an unique ID (in my case, a gene). I have populated the values by a list of tuples each indicating the startpoint and length of a feature (in this case, an ORF or in other words potential protein coding sequence) of the gene ID. Any given gene can have many such features. The general form is as under:

{key1:[(startpoint1,length1)], key2[(startpoint1,length1)(startpoint2,length2)...], key3[]}

As shown below (in a sample dictionary), some keys could have only one feature (only one tuple pair), while others could have as many as 100 or more. For simplicity, I have shown seq1 with 3 pairs. Also there can be keys with no features, for example, seq3 and 4.

{'seq2': [(1,6)], 'seq1': [(1, 12), (16, 9), (32,9)], 'seq3': [], 'seq4': []}

I want to iterate through this dictionary to get the "startpoint" when the "length" is maximum. In my example the answer I should get is

startpoint 1 (in seq1), because it has biggest length value (12) among all the entries.

I find it hard to iterate over multiple tuples.

Siva C
  • 3
  • 3
  • What do you find hard in iterating over multiple tuples? Are you facing any issue with any attempts? If so please post that as well as the issue you are facing with it – Anand S Kumar Aug 08 '15 at 04:01
  • related: http://stackoverflow.com/questions/5098580/implementing-argmax-in-python also notice [max can take an iterable and a custom function](https://docs.python.org/2/library/functions.html#max) – Paul Aug 08 '15 at 04:03

3 Answers3

1

This piece of code should do.

myDict = {'seq2': [(1,6)], 'seq1': [(1, 12), (16, 9), (32,9)], 'seq3': [], 'seq4': []}

maxLength = 0;
maxStartingPoint = 0;
maxSeq = ""

for sequence in myDict:

    for key,value in myDict[sequence]:
        if value > maxLength:
            maxLength = value
            maxStartingPoint = key
            maxSeq = sequence
1

You can also use itertools.chain.from_iterable , passing it the dict.values() (list of all values in your dictionary) . And then use max with the key argument, to get back the tuple with maximum values at second index ( 1-index), Example -

>>> from itertools import chain
>>> d = {'seq2': [(1,6)], 'seq1': [(1, 12), (16, 9), (32,9)], 'seq3': [], 'seq4': []}
>>> max(chain.from_iterable(d.values()),key=lambda x: x[1])
(1, 12)

itertools.chain.from_iterable , takes the iterable it gets as input and chains each up into a single list, Example -

>>> l = [(1,2),(3,4)]
>>> list(chain.from_iterable(l))
[1, 2, 3, 4]
Anand S Kumar
  • 88,551
  • 18
  • 188
  • 176
  • Thanks for that. I had tried by nesting for loops and getting lost in the syntax. – Siva C Aug 08 '15 at 04:41
  • You might want to use `itervalues` instead of `values` to reduce memory foot print: `values` create a copy the dictionary's values, whereas `itervalues` does not. – Hai Vu Aug 08 '15 at 16:20
0

Here is my approach: create a list of tuples (length, start_point, key), this way, the max function can just pick the tuple with the maximum length.

def maxlen(seq):
    longest = max((length, start_point, key) for key, value in seq.iteritems() for start_point, length in value)
    return longest

# Test
seq = {'seq2': [(1,6)], 'seq1': [(1, 12), (16, 9), (32,9)], 'seq3': [], 'seq4': []}
length, start_point, key = maxlen(seq)  # 12, 1, 'seq1'
Hai Vu
  • 37,849
  • 11
  • 66
  • 93