1

I have two lists - one for a team name and the second for points

I wanted to sort them based on the points without losing track of the corresponding team. I found something on SO that works great for doing this - and it sorts teams by points in descending order (most to least).

There doesn't appear to be a sort on the second list.

EDIT - Original Data

Teams       Pts
D Clark   0
D Dupuis   2
T Leach   2
J Schutz    0
C Hearder 2
R Pagani  0
M Cameron  2
B Hunter  0

This is what I'm using:

pts, teams = zip(*sorted(zip(pts, teams), reverse=True))
i = 1
for team, num_pts in zip(teams, pts):
    standings = str(i) + '. ' + team + " ("  + num_pts + ")"
    print standings
    i += 1

And it results in this:

1. T Leach (2)
2. M Cameron (2)
3. D Dupuis (2)
4. C Hearder (2)
5. R Pagani (0)
6. J Schutz (0)
7. D Clark (0)
8. B Hunter (0)

What I'm trying to do is to add an additional sort on team name - so if there's a tie, then it will sort by names in ascending order,

Like This:

1. C Hearder (2)
2. D Dupuis (2)
3. M Cameron (2)
4. T Leach (2)
5. B Hunter (0)
6. D Clark (0)
7. J Schutz (0)
8. R Pagani (0)
dbmitch
  • 5,361
  • 4
  • 24
  • 38
  • 3
    You want to call sorted with `key` argument, with a callable, see http://stackoverflow.com/questions/3121979/how-to-sort-list-tuple-of-lists-tuples – metatoaster Oct 09 '16 at 03:50
  • Can you show the original data as well? – thefourtheye Oct 09 '16 at 03:52
  • Sorry - very new to Python - not sure how I would modify that with my two lists - I guess they're kind of like tuples. And what does lambda mean? – dbmitch Oct 09 '16 at 03:54
  • @thefourtheye - sure I'll have to dump it from my html processing. Hold on – dbmitch Oct 09 '16 at 03:55
  • Okay - added original order of data – dbmitch Oct 09 '16 at 03:58
  • @metatoaster - can I do this with two different sort orders? Is there an example of that somewhere. Thanks for everyone's help. SO has been a great resource for examples – dbmitch Oct 09 '16 at 04:15

2 Answers2

2

You have two options here. You could use cmp or key as a parameter to sorted.

With a cmp you're defining a function which takes two arguments, the two values being compared. If the first argument is smaller than the second, return a negative number. If it is larger than the second, return a positive number. If they are equal, return 0.

First example using cmp:

pts = ['2', '2', '2', '2', '2', '0', '0', '0', '0']
teams = ['T Leach', 'M Cameron', 'D Dupuis', 'C Hearder', 'R Pagani',
         'J Schutz', 'D Clark', 'B Hunter']

def custom_cmp(x,y):
    return cmp(x[0], y[0]) or cmp(y[1], x[1])

pts, teams = zip(*sorted(zip(pts, teams), cmp=custom_cmp, reverse=True))

The python or operator will return the left hand side if it evaluates to True (not 0 as described above), otherwise it will return the right hand side. Therefore this custom_cmp returns the comparison of the points from x to y if the points are not equivalent. Otherwise it compares the teams from y to x (the opposite order).

With key on the other hand you are just considering how to view an element individually in order that python will naturally understand how to sort it how you want. We know the first item is being compared properly, so now we just need to flip the second. Numbers would be easy, we could just take the negative. But how do you flip a string?

One approach is to take the ord (which should be an ascii value integer for your strings) of each character in the string. Then you can take the negative of these values and package them into a tuple. This is what one of these would look like:

>>> tuple(-ord(c) for c in 'C Hearder')
(-67, -32, -72, -101, -97, -114, -100, -101, -114)

Second example using key:

def custom_key(pair):
    pts, team = pair
    return (pts, tuple(-ord(c) for c in team))

pts, teams = zip(*sorted(zip(pts, teams), key=custom_key, reverse=True))

Restrospective edit:

If we're defining a custom key we may as well invert the key itself and remove the reverse argument:

def custom_key(pair):
    pts, team = pair
    return (-int(pts), team)

pts, teams = zip(*sorted(zip(pts, teams), key=custom_key))
machine yearning
  • 9,889
  • 5
  • 38
  • 51
  • Thanks - that's perfect - worked well. So everything is based on the cmp function? That works on both strings and numbers? – dbmitch Oct 09 '16 at 04:02
  • 1
    Avoid using `cmp` and use `key` function. It is inefficient. – thefourtheye Oct 09 '16 at 04:05
  • Great, you see what it's doing right? The python `or` operator will return the left hand side if it evaluates to `True`, otherwise it will return the right hand side. So `custom_cmp` returns the comparison of the points from `x` to `y` if the points are not equivalent. Otherwise it compares the names from `y` to `x` (the opposite order). – machine yearning Oct 09 '16 at 04:06
  • 1
    @thefourtheye You don't need to avoid using something simply because it's inefficient. I'm rarely constrained by computational efficiency in simple scripts. Sometimes ease of expression trumps computation time. But if you wanna write an answer using `key` instead, feel free. – machine yearning Oct 09 '16 at 04:08
  • @machineyearning - yes - thanks - that's easy to understand now. Really appreciate it. I'm only sorting 4 lists - each with 8 items at a time - so I'm more concerned about readability and functionality, then efficiency - I'll try and find some examples for the key function - but this is easy to understand – dbmitch Oct 09 '16 at 04:09
  • Nice second example - but I do like the first one! – dbmitch Oct 09 '16 at 04:27
  • @thefourtheye I added an example using the `key` paradigm. – machine yearning Oct 09 '16 at 04:27
  • 1
    @dbmitch I've substantially expanded the explanations, you may find them interesting if you're just learning how these things work. – machine yearning Oct 09 '16 at 04:45
  • Yes - I appreciate that. As an experienced programmer, but Python newbie I really agree with your comment *"Sometimes ease of expression trumps computation time."* – dbmitch Oct 09 '16 at 04:55
1

You can specify how sorted will sort by passing in the key kwarg. One handy way to use key is with itemgetter from the operator module.

For example, if you have a list of tuples (such as by zipping two lists) and you want to sort by the second item, then the first, the key would be itemgetter(1,0) so you might do something like this:

from operator import itemgetter
teams_and_points = zip(teams, pts)
for team, points in sorted(points_and_teams, key=itemgetter(1, 0)):
    print(team, points)
sytech
  • 29,298
  • 3
  • 45
  • 86
  • Where does sort order come in to play? Is that `key=itemgetter(1, 0))` sorting the second item in reverse order, and then the first item in ascending order? – dbmitch Oct 09 '16 at 04:12
  • 1
    The question was about how to sort by ascending in one and descending in the other. – machine yearning Oct 09 '16 at 04:12
  • 2
    `key=lamdba x: (-x[1], x[0])` – thefourtheye Oct 09 '16 at 04:14
  • I'm sure there's a way of doing that. Like how @thefourtheye suggested. – sytech Oct 09 '16 at 04:17
  • @thefourtheye - that looks promising - sorry for being such a newbie with my questions - but what does the `lamda x` refer to and how would that be used in my example? I think I understand that the negative sign in `-x[1]` must mean to sort the second element in reverse order, the `x[0]` indicates sorting the first element in ascending order? – dbmitch Oct 09 '16 at 04:21
  • @dbmitch `lambda x: ` creates a new lambda function and passes that to `key` parameter. `lambda` function bodies can be just expressions and in this case, it is `(-x[1], x[0])`. And you are correct about the rest – thefourtheye Oct 09 '16 at 04:23
  • 1
    @thefourtheye You can't take the negative of a string – machine yearning Oct 09 '16 at 04:50
  • @machineyearning He `zip`s, `teams` and `pts`, so second element would be the point and I assume it would be a number. – thefourtheye Oct 09 '16 at 06:16
  • @thefourtheye oh you're right, I was confused. Weird that he switched the order of the elements like that. – machine yearning Oct 09 '16 at 06:19
  • @machineyearning No problem. The reason I suggest using `key` over `cmp` is that, if the list has `n` elements, `cmp` has to be called `n*(n -1)/2` times, but `key` has to be called just `n` times. – thefourtheye Oct 09 '16 at 06:20
  • @thefourtheye Are you implying that python sorting is order n squared with `cmp` but only order n with `key`? I find that very hard to believe. Pretty sure the same sorting algorithm is used in both cases, meaning the same number of comparisons is performed. – machine yearning Oct 09 '16 at 06:47
  • @machineyearning No. I am not talking about the algorithmic complexity. When you compare two elements from the list, everytime `cmp` function has to be called, whereas `key` will be called once per element and that value is used internally to compare. – thefourtheye Oct 09 '16 at 06:58
  • @thefourtheye You're not explaining why one is more efficient than the other. You may as well be stating two unrelated facts. For example I could say `cmp` is faster since it starts with "c" not "k" like `key`. – machine yearning Oct 09 '16 at 07:04
  • @machineyearning `[{'a':2}, {'a':1}, {'a':4}, {'a':3}]`, how many comparisons will have to be made here to sort this, based on the value of `a`? – thefourtheye Oct 09 '16 at 07:11
  • @machineyearning there will be exactly `N` calls to a `key` function, but min `N` and max `O(N lg N)` calls to a `cmp` function; and `cmp` functions are usually more complex than `key` functions which add to complexity. However `key` wastes more space which in this case is negligible. The `key` function is used for [Schwartzian transform](https://en.wikipedia.org/wiki/Schwartzian_transform). – Antti Haapala -- Слава Україні Oct 09 '16 at 07:42
  • Furthermore `cmp` was removed from Python 3 exactly because the `key` should have been preferred. – Antti Haapala -- Слава Україні Oct 09 '16 at 07:44
  • @AnttiHaapala But that doesn't change the fact that even using `key` you still have to make `O(N lg N)` comparisons to sort a list. Also your assertion about `cmp` functions being "more complex" is misguided to say the least. I've not seen anyone give a proper example or clear explanation as to why `key` results in more efficient sorting. The only good reason I've seen is that `cmp` was removed in Python 3. – machine yearning Oct 09 '16 at 18:10
  • @machineyearning then please read the article on Wikipedia that I linked 10 hours ago. I didn't invent this stuff, and the difference between `cmp` and `key` in sorting is so profound that there is a name and a Wikipedia article for the latter. – Antti Haapala -- Слава Україні Oct 09 '16 at 18:36
  • @AnttiHaapala I am familiar with the idea. But in this case implementing a Schwartzian transform only gives a constant order of magnitude improvement in sorting time. I'm not sure what your definition of "profound" is but when discussing sorting algorithms constants are usually omitted. – machine yearning Oct 09 '16 at 18:49
  • No, within the same complexity class the constants **are** discussed, that's why Python uses timsort instead of plain merge sort. – Antti Haapala -- Слава Україні Oct 09 '16 at 18:52
  • @AnttiHaapala I don't really disagree with what you're saying. The original question was, is it so egregiously inefficient to use `cmp` that answers using it should be basically ignored? My counter to that argument was that in a typical one-off script you won't be running closely enough to computational bottlenecks for a small constant multiplier to make any difference. So you should just save programmer time and use whichever one you find easier and more natural to express at that moment. In general programmer time is more valuable than CPU time. – machine yearning Oct 09 '16 at 19:06
  • @machineyearning Okay, using `key` over `cmp` was supposed to be just a suggestion. As you already know the reason why `key` is better than `cmp`, please feel free to use whichever is comfortable for you. – thefourtheye Oct 10 '16 at 10:30