4

I am a python newbie here, and I have been struck on a rather simple problem - and I am looking for the most efficient way to solve this. So, I have 5 lists as follows:

a,b,c,d,score

where the above lists all have the same size (500 in my case). a,b,c,d are string lists and score is an int list.

What I would like to do is sort a,b,c,d based on ascending or descending sorting of score. So, I would first want to sort score based on a descending pattern, and then sort the corresponding elements in a,b,c,d based on the sorted score list (in the same order).

I was thinking of enumerate to achieve this, but I am wondering if itertools could be used here to make it faster and more efficient.

Any guidance on how this can be achieved would be much appreciated && sorry if this is a 101 question.

AJW
  • 5,569
  • 10
  • 44
  • 57
  • 1
    Please, see similar question here: http://stackoverflow.com/questions/6618515/sorting-list-based-on-values-from-another-list – alecxe Mar 25 '13 at 08:51
  • which is in turn the dupe of http://stackoverflow.com/q/9543211/989121. However, if lists are large and python == 2, then `it.izip` would be better than just `zip`. – georg Mar 25 '13 at 08:57

2 Answers2

7
sorted_lists = sorted(izip(a, b, c, d, score), reverse=True, key=lambda x: x[4])
a, b, c, d, score = [[x[i] for x in sorted_lists] for i in range(5)]

In this first step, zip the lists together. This takes the first element from every list and puts them into a tuple, appends that tuple to a new list, then does the same for the second element in every list, and so on. Then we sort this list of tuples by the fifth element (this is from the anonymous function passed into the key argument). We set reverse=True so that the list is descending.

In the second step, we split the lists out using some nested list comprehensions and tuple unpacking. We make a new list of lists, where each inner list is all the first elements of each tuple in sorted_lists. You could do this in one line as below, but I think splitting it into two pieces may be a bit clearer:

a, b, c, d, score = izip(*sorted(izip(a, b, c, d, score), reverse=True,
                         key=lambda x: x[4]))

Here is a generic function that returns a list of tuples, where the tuples are the sorted lists:

def sort_lists_by(lists, key_list=0, desc=False):
    return izip(*sorted(izip(*lists), reverse=desc,
                 key=lambda x: x[key_list]))
EML
  • 435
  • 4
  • 12
  • thanks EML for your awesome solution - I was just wondering - can there be a more generic solution as in, where `x[4]` and `range(5)` can be avoided? I am asking this because one could have more than 5 lists.. and I am not sure it helps me understand when values are hardcoded :( – AJW Mar 25 '13 at 09:09
  • Edited to add a generic function. The `range(5)` is just there to designate the index of each element in the sublists, so we can break each sub-list back out after we've zipped them together. The `x[4]` is just to indicate the list by which we're sorting all the others. In the generic function, these are replaced. – EML Mar 25 '13 at 09:18
  • Interesting, `zip` transposes nested lists. Analogous functionality to transposing arrays. – Eike Mar 25 '13 at 09:19
  • @EML: Thanks a tonne for this. I think it will take a while for me to grasp this completely - but I get the idea of your sort. I will now try this step by step and understand it more clearly. Thanks again, accepted your answer. – AJW Mar 25 '13 at 09:20
  • @EML: I was also wondering if you could care to comment on why you chose to use izip vs zip? :) Also, what does `desc=False` do in the function? Sorry if I am being stupid, but I am not able to see where `desc` is used – AJW Mar 25 '13 at 09:21
  • 1
    `izip` has essentially the same output as `zip`, but rather than allocating a ton of new memory for the entire to-be-constructed combined list, it creates a 'generator', which basically outputs the product one item at a time, recycling memory where possible. It uses much less memory and is faster in most cases. Here's a nice talk by Ray Hettinger where he briefly mentions this: http://www.youtube.com/watch?v=OSGv2VnC0go – EML Mar 25 '13 at 09:24
  • Thanks for the awesome Ray H talk ;) – AJW Mar 25 '13 at 09:28
  • I actually made a typo and forgot to set `reverse=desc`. So you weren't being stupid - just noticing my error. I fixed it. I put `desc=False` in the args so that you can choose whether to sort descending or ascending when calling this generic function. It makes more sense for the default behavior between `sorted` and our new `sort_lists_by` to be the same. – EML Mar 25 '13 at 09:30
  • OK. I kind of guessed that, but wasnt sure if you had some other logic :) – AJW Mar 25 '13 at 09:40
  • I am trying to use the function you have written, and I am wondering how do I pass the lists `a,b,c,d,score` to the function? Do I need to pass a zip here? – AJW Mar 25 '13 at 09:44
  • 1
    OK Got it. Just pass them as a list I assume like `[a,b,c,d,score]` – AJW Mar 25 '13 at 09:48
3

If you are doing a lot of numerical work or array manipulation, it might be worth looking into using numpy. This problem is very easily solved with a numpy array:

In [1]: import numpy as np
In [2]: a = ['hi','hello']
In [3]: b = ['alice','bob']
In [4]: c = ['foo','bar']
In [5]: d = ['spam','eggs']
In [6]: score = [42,17]

From this, make a list of tuples in the format (a,b,c,d,score) and store each one with a dtype (str,str,str,str,int), and you can even give them names ('a','b','c','d','score') to access them later:

In [7]: data = np.array(zip(a,b,c,d,score),
   ...:         dtype = [('a','S5'),('b','S5'),('c','S5'),('d','S5'),('score',int)]
   ...:     )

In [8]: data
Out[8]: 
array([('hi', 'alice', 'foo', 'spam', 42),
       ('hello', 'bob', 'bar', 'eggs', 17)], 
      dtype=[('a', 'S5'), ('b', 'S5'), ('c', 'S5'), ('d', 'S5'), ('score', '<i8')])

The advantage of this array is that you can access all the 'lists' (fields) by their name:

In [9]: data['a']
Out[9]: 
array(['hi', 'hello'], 
      dtype='|S5')

In [10]: data['score']
Out[10]: array([42, 17])

To sort them, just give the name of field you want to sort by:

In [11]: sdata = np.sort(data, order='score')

In [12]: sdata
Out[12]: 
array([('hello', 'bob', 'bar', 'eggs', 17),
       ('hi', 'alice', 'foo', 'spam', 42)], 
      dtype=[('a', 'S5'), ('b', 'S5'), ('c', 'S5'), ('d', 'S5'), ('score', '<i8')])

In [13]: sdata['b']
Out[13]: 
array(['bob', 'alice'], 
      dtype='|S5')
askewchan
  • 45,161
  • 17
  • 118
  • 134