142

Or, practically, how can I sort a list of dictionaries by multiple keys?

I have a list of dicts:

b = [{u'TOT_PTS_Misc': u'Utley, Alex', u'Total_Points': 96.0},
 {u'TOT_PTS_Misc': u'Russo, Brandon', u'Total_Points': 96.0},
 {u'TOT_PTS_Misc': u'Chappell, Justin', u'Total_Points': 96.0},
 {u'TOT_PTS_Misc': u'Foster, Toney', u'Total_Points': 80.0},
 {u'TOT_PTS_Misc': u'Lawson, Roman', u'Total_Points': 80.0},
 {u'TOT_PTS_Misc': u'Lempke, Sam', u'Total_Points': 80.0},
 {u'TOT_PTS_Misc': u'Gnezda, Alex', u'Total_Points': 78.0},
 {u'TOT_PTS_Misc': u'Kirks, Damien', u'Total_Points': 78.0},
 {u'TOT_PTS_Misc': u'Worden, Tom', u'Total_Points': 78.0},
 {u'TOT_PTS_Misc': u'Korecz, Mike', u'Total_Points': 78.0},
 {u'TOT_PTS_Misc': u'Swartz, Brian', u'Total_Points': 66.0},
 {u'TOT_PTS_Misc': u'Burgess, Randy', u'Total_Points': 66.0},
 {u'TOT_PTS_Misc': u'Smugala, Ryan', u'Total_Points': 66.0},
 {u'TOT_PTS_Misc': u'Harmon, Gary', u'Total_Points': 66.0},
 {u'TOT_PTS_Misc': u'Blasinsky, Scott', u'Total_Points': 60.0},
 {u'TOT_PTS_Misc': u'Carter III, Laymon', u'Total_Points': 60.0},
 {u'TOT_PTS_Misc': u'Coleman, Johnathan', u'Total_Points': 60.0},
 {u'TOT_PTS_Misc': u'Venditti, Nick', u'Total_Points': 60.0},
 {u'TOT_PTS_Misc': u'Blackwell, Devon', u'Total_Points': 60.0},
 {u'TOT_PTS_Misc': u'Kovach, Alex', u'Total_Points': 60.0},
 {u'TOT_PTS_Misc': u'Bolden, Antonio', u'Total_Points': 60.0},
 {u'TOT_PTS_Misc': u'Smith, Ryan', u'Total_Points': 60.0}]

and I need to use a multi key sort reversed by Total_Points, then not reversed by TOT_PTS_Misc.

This can be done at the command prompt like so:

a = sorted(b, key=lambda d: (-d['Total_Points'], d['TOT_PTS_Misc']))

But I have to run this through a function, where I pass in the list and the sort keys. For example, def multikeysort(dict_list, sortkeys):.

How can the lambda line be used which will sort the list, for an arbitrary number of keys that are passed in to the multikeysort function, and take into consideration that the sortkeys may have any number of keys and those that need reversed sorts will be identified with a '-' before it?

martineau
  • 119,623
  • 25
  • 170
  • 301
simi
  • 1,535
  • 3
  • 12
  • 14
  • Does this answer your question? [sort Python list with two keys but only one in reverse order](https://stackoverflow.com/questions/37693373/sort-python-list-with-two-keys-but-only-one-in-reverse-order) – user202729 Aug 14 '21 at 13:41

8 Answers8

119

This article has a nice rundown on various techniques for doing this. If your requirements are simpler than "full bidirectional multikey", take a look. It's clear the accepted answer and the blog post I just referenced influenced each other in some way, though I don't know which order.

In case the link dies here's a very quick synopsis of examples not covered above:

mylist = sorted(mylist, key=itemgetter('name', 'age'))
mylist = sorted(mylist, key=lambda k: (k['name'].lower(), k['age']))
mylist = sorted(mylist, key=lambda k: (k['name'].lower(), -k['age']))
Iulian Onofrei
  • 9,188
  • 10
  • 67
  • 113
Scott Stafford
  • 43,764
  • 28
  • 129
  • 177
  • 1
    As near as I can tell, stygianvision uses my code and gives no credit. Google for `result = cmp(fn(left), fn(right))` – hughdbrown Aug 12 '14 at 20:50
  • 5
    Thanks for the synopsis, Link is actually dead now. :) – Amyth Dec 04 '15 at 07:56
  • thanks, exactly what i was looking for. this should be the accepted answer maybe. – philx_x Dec 09 '21 at 14:01
  • 2
    There is no detailed explanation in the link provided, just a general ideas which are obvious already. Here is better explanation on the official python website: https://docs.python.org/3/howto/sorting.html – user3151858 Dec 12 '21 at 13:18
93

This answer works for any kind of column in the dictionary -- the negated column need not be a number.

def multikeysort(items, columns):
    from operator import itemgetter
    comparers = [((itemgetter(col[1:].strip()), -1) if col.startswith('-') else
                  (itemgetter(col.strip()), 1)) for col in columns]
    def comparer(left, right):
        for fn, mult in comparers:
            result = cmp(fn(left), fn(right))
            if result:
                return mult * result
        else:
            return 0
    return sorted(items, cmp=comparer)

You can call it like this:

b = [{u'TOT_PTS_Misc': u'Utley, Alex', u'Total_Points': 96.0},
     {u'TOT_PTS_Misc': u'Russo, Brandon', u'Total_Points': 96.0},
     {u'TOT_PTS_Misc': u'Chappell, Justin', u'Total_Points': 96.0},
     {u'TOT_PTS_Misc': u'Foster, Toney', u'Total_Points': 80.0},
     {u'TOT_PTS_Misc': u'Lawson, Roman', u'Total_Points': 80.0},
     {u'TOT_PTS_Misc': u'Lempke, Sam', u'Total_Points': 80.0},
     {u'TOT_PTS_Misc': u'Gnezda, Alex', u'Total_Points': 78.0},
     {u'TOT_PTS_Misc': u'Kirks, Damien', u'Total_Points': 78.0},
     {u'TOT_PTS_Misc': u'Worden, Tom', u'Total_Points': 78.0},
     {u'TOT_PTS_Misc': u'Korecz, Mike', u'Total_Points': 78.0},
     {u'TOT_PTS_Misc': u'Swartz, Brian', u'Total_Points': 66.0},
     {u'TOT_PTS_Misc': u'Burgess, Randy', u'Total_Points': 66.0},
     {u'TOT_PTS_Misc': u'Smugala, Ryan', u'Total_Points': 66.0},
     {u'TOT_PTS_Misc': u'Harmon, Gary', u'Total_Points': 66.0},
     {u'TOT_PTS_Misc': u'Blasinsky, Scott', u'Total_Points': 60.0},
     {u'TOT_PTS_Misc': u'Carter III, Laymon', u'Total_Points': 60.0},
     {u'TOT_PTS_Misc': u'Coleman, Johnathan', u'Total_Points': 60.0},
     {u'TOT_PTS_Misc': u'Venditti, Nick', u'Total_Points': 60.0},
     {u'TOT_PTS_Misc': u'Blackwell, Devon', u'Total_Points': 60.0},
     {u'TOT_PTS_Misc': u'Kovach, Alex', u'Total_Points': 60.0},
     {u'TOT_PTS_Misc': u'Bolden, Antonio', u'Total_Points': 60.0},
     {u'TOT_PTS_Misc': u'Smith, Ryan', u'Total_Points': 60.0}]

a = multikeysort(b, ['-Total_Points', 'TOT_PTS_Misc'])
for item in a:
    print item

Try it with either column negated. You will see the sort order reverse.

Next: change it so it does not use extra class....


2016-01-17

Taking my inspiration from this answer What is the best way to get the first item from an iterable matching a condition?, I shortened the code:

from operator import itemgetter as i

def multikeysort(items, columns):
    comparers = [
        ((i(col[1:].strip()), -1) if col.startswith('-') else (i(col.strip()), 1))
        for col in columns
    ]
    def comparer(left, right):
        comparer_iter = (
            cmp(fn(left), fn(right)) * mult
            for fn, mult in comparers
        )
        return next((result for result in comparer_iter if result), 0)
    return sorted(items, cmp=comparer)

In case you like your code terse.


Later 2016-01-17

This works with python3 (which eliminated the cmp argument to sort):

from operator import itemgetter as i
from functools import cmp_to_key

def cmp(x, y):
    """
    Replacement for built-in function cmp that was removed in Python 3

    Compare the two objects x and y and return an integer according to
    the outcome. The return value is negative if x < y, zero if x == y
    and strictly positive if x > y.

    https://portingguide.readthedocs.io/en/latest/comparisons.html#the-cmp-function
    """

    return (x > y) - (x < y)

def multikeysort(items, columns):
    comparers = [
        ((i(col[1:].strip()), -1) if col.startswith('-') else (i(col.strip()), 1))
        for col in columns
    ]
    def comparer(left, right):
        comparer_iter = (
            cmp(fn(left), fn(right)) * mult
            for fn, mult in comparers
        )
        return next((result for result in comparer_iter if result), 0)
    return sorted(items, key=cmp_to_key(comparer))

Inspired by this answer How should I do custom sort in Python 3?

zeraien
  • 165
  • 11
hughdbrown
  • 47,733
  • 20
  • 85
  • 108
  • This works the best because I can use the reverse on any keys or columns. Thank you! – simi Jul 17 '09 at 17:10
  • So this does work well. I call my function with the list and string as parameters. I split the string first then call the multikeysort with the list and the list of keys from the split string. It does not matter which item in the string has the '-' at the start of the column name, because it will work with either item or all items. Awesome. Thank you. – simi Jul 17 '09 at 17:30
  • Let's say that there are items in the list of dicts (b) which do not have the keys that I would like to sort by. How would I test for them? I tried a try/except, but it's not returning anything. – simi Jul 17 '09 at 19:47
  • You have a list of dictionaries, some with all the columns required, some without? Or sometimes you want to use this on a list of dictionaries and you pass in a list of columns to sort on and the dictionaries are totally of the wrong type? – hughdbrown Jul 17 '09 at 21:52
  • In the example, I was only using the list of dictionaries that contained the required columns, or rather, the columns I want to sort by. But, the actual data, and I guess I should have made that part of the example, has dictionaries with various columns, some of which may not even part of the two columns I need to sort by. I use a filter to give me a list of dictionaries that have at least one of the columns I sort by, I just want to make sure I take into consideration all different types of scenarios. Some columns may even have None. Sorry if I am confusing everything. – simi Jul 20 '09 at 14:08
  • comparers = [ ((itemgetter(col[1:].strip()), -1) if col.startswith('-') else (itemgetter(col.strip()), 1)) for col in columns] why am i getting a syntax error on this line near the if? – Suyash Aug 12 '14 at 19:51
  • @Suyash: I don't know. Which version of python are you using? The ternary operator was added in version 2.5, so if you are using something earlier than that (no later than September 2006), then you will have a syntax error. – hughdbrown Aug 12 '14 at 20:46
  • Huh. When I google for `result = cmp(fn(left), fn(right))`, I can see that this code has traveled all over the web. https://www.google.com/#q=%22result+%3D+cmp(fn(left)%2C+fn(right))%22 – hughdbrown Aug 12 '14 at 20:59
  • Should be a built-in in Python. Excellent! (I always come back to use it) – Daniel F Oct 22 '14 at 16:17
  • This doesn't sort the 2nd column. Just the first. – Joel Dec 12 '15 at 20:34
  • @Joel Check this pastebin: http://pastebin.com/abGKRU5Z Things equal on the first key are together and they are sorted by the second key, descending order by the first key, ascending order by the second. No? – hughdbrown Dec 13 '15 at 03:10
  • 4
    `cmp()` isn't available for Python3, so I had to define it myself, as mentioned here: http://stackoverflow.com/a/22490617/398514 – pferate Feb 02 '16 at 19:03
  • @pferate: was the code marked "Later 2016-01-17 This works with python3 (which eliminated the cmp argument to sort)" not working for you? – hughdbrown Feb 03 '16 at 19:34
  • 8
    @hughdbrown: You removed the `cmp` keyword, but the `cmp()` function is still used 4 lines above. I tried it with 3.2, 3.3, 3.4 and 3.5, they all failed at the function call, because `cmp()` is not defined. The third bullet here (https://docs.python.org/3.0/whatsnew/3.0.html#ordering-comparisons) mentions treating `cmp()` as gone. – pferate Feb 04 '16 at 16:58
  • shouldn't be `cmp(fn(left), fn(right)) * -mult` since sorted relies on `__lt__` ? – C. Claudio Nov 03 '22 at 12:46
  • Note python 3 has `functools.cmp_to_key`, see https://docs.python.org/3/library/functools.html#functools.cmp_to_key. See also https://stackoverflow.com/a/23756830 – dfrankow Apr 03 '23 at 18:16
  • Also, the HOWTO shows how to use itemgetter and attrgetter to use multiple fields: https://docs.python.org/3/howto/sorting.html#operator-module-functions – dfrankow Apr 03 '23 at 18:25
  • @dfrankow re: using itemgetter/attrgetter on multiple fields. Your comment is true, but does not allow for sorting to be on the negation of fields. Yes, you can `reversed= True` but how does that work if you want to reverse on one field but not the other? So that is why I wrote the code like that. – hughdbrown Apr 04 '23 at 20:06
73

I know this is a rather old question, but none of the answers mention that Python guarantees a stable sort order for its sorting routines such as list.sort() and sorted(), which means items that compare equal retain their original order.

This means that the equivalent of ORDER BY name ASC, age DESC (using SQL notation) for a list of dictionaries can be done like this:

items.sort(key=operator.itemgetter('age'), reverse=True)
items.sort(key=operator.itemgetter('name'))

Note how the items are first sorted by the "lesser" attribute age (descending), then by the "major" attribute name, leading to the correct final order.

The reversing/inverting works for all orderable types, not just numbers which you can negate by putting a minus sign in front.

And because of the Timsort algorithm used in (at least) CPython, this is actually rather fast in practice.

bugmenot123
  • 1,069
  • 1
  • 18
  • 33
wouter bolsterlee
  • 3,879
  • 22
  • 30
  • 3
    very nice. for moderate data sets where sorting the set multiple times doesn't matter, this is super cool! As you point out, you have to reverse the python sort compared to the sql sort. Thanks. – Greg Oct 13 '16 at 17:08
  • 1
    The second sort will break the result of the first. Funny that none of upvoters noticed it. – volcano Aug 06 '18 at 14:31
  • 16
    funny that you did not notice that the primary sort criterion goes last, as shown in my example, and explicitly mentioned in the other comment to make it very clear in case you did not notice. – wouter bolsterlee Aug 06 '18 at 19:14
  • It seems that using `reverse=True` on the primary sort criterion screws this up? Try with list `l = [[4, "b"], [1, "a"], [2, "a"], [2, "b"], [3, "b"]]` – Locane Jun 02 '21 at 21:39
  • seems just fine: `>>> l = [[4, "b"], [1, "a"], [2, "a"], [2, "b"], [3, "b"]] >>> random.shuffle(l) >>> l [[4, 'b'], [1, 'a'], [2, 'b'], [3, 'b'], [2, 'a']] >>> l.sort(key=itemgetter(1)) >>> l.sort(key=itemgetter(0), reverse=True) >>> l [[4, 'b'], [3, 'b'], [2, 'a'], [2, 'b'], [1, 'a']] ` – wouter bolsterlee Jun 04 '21 at 09:51
25
def sortkeypicker(keynames):
    negate = set()
    for i, k in enumerate(keynames):
        if k[:1] == '-':
            keynames[i] = k[1:]
            negate.add(k[1:])
    def getit(adict):
       composite = [adict[k] for k in keynames]
       for i, (k, v) in enumerate(zip(keynames, composite)):
           if k in negate:
               composite[i] = -v
       return composite
    return getit

a = sorted(b, key=sortkeypicker(['-Total_Points', 'TOT_PTS_Misc']))
Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
15

I had a similar issue today - I had to sort dictionary items by descending numeric values and by ascending string values. To solve the issue of conflicting directions, I negated the integer values.

Here's a variant of my solution - as applicable to OP

sorted(b, key=lambda e: (-e['Total_Points'], e['TOT_PTS_Misc']))

Very simple - and works like a charm

[{'TOT_PTS_Misc': 'Chappell, Justin', 'Total_Points': 96.0},
 {'TOT_PTS_Misc': 'Russo, Brandon', 'Total_Points': 96.0},
 {'TOT_PTS_Misc': 'Utley, Alex', 'Total_Points': 96.0},
 {'TOT_PTS_Misc': 'Foster, Toney', 'Total_Points': 80.0},
 {'TOT_PTS_Misc': 'Lawson, Roman', 'Total_Points': 80.0},
 {'TOT_PTS_Misc': 'Lempke, Sam', 'Total_Points': 80.0},
 {'TOT_PTS_Misc': 'Gnezda, Alex', 'Total_Points': 78.0},
 {'TOT_PTS_Misc': 'Kirks, Damien', 'Total_Points': 78.0},
 {'TOT_PTS_Misc': 'Korecz, Mike', 'Total_Points': 78.0},
 {'TOT_PTS_Misc': 'Worden, Tom', 'Total_Points': 78.0},
 {'TOT_PTS_Misc': 'Burgess, Randy', 'Total_Points': 66.0},
 {'TOT_PTS_Misc': 'Harmon, Gary', 'Total_Points': 66.0},
 {'TOT_PTS_Misc': 'Smugala, Ryan', 'Total_Points': 66.0},
 {'TOT_PTS_Misc': 'Swartz, Brian', 'Total_Points': 66.0},
 {'TOT_PTS_Misc': 'Blackwell, Devon', 'Total_Points': 60.0},
 {'TOT_PTS_Misc': 'Blasinsky, Scott', 'Total_Points': 60.0},
 {'TOT_PTS_Misc': 'Bolden, Antonio', 'Total_Points': 60.0},
 {'TOT_PTS_Misc': 'Carter III, Laymon', 'Total_Points': 60.0},
 {'TOT_PTS_Misc': 'Coleman, Johnathan', 'Total_Points': 60.0},
 {'TOT_PTS_Misc': 'Kovach, Alex', 'Total_Points': 60.0},
 {'TOT_PTS_Misc': 'Smith, Ryan', 'Total_Points': 60.0},
 {'TOT_PTS_Misc': 'Venditti, Nick', 'Total_Points': 60.0}]
volcano
  • 3,578
  • 21
  • 28
5

I use the following for sorting a 2d array on a number of columns

def k(a,b):
    def _k(item):
        return (item[a],item[b])
    return _k

This could be extended to work on an arbitrary number of items. I tend to think finding a better access pattern to your sortable keys is better than writing a fancy comparator.

>>> data = [[0,1,2,3,4],[0,2,3,4,5],[1,0,2,3,4]]
>>> sorted(data, key=k(0,1))
[[0, 1, 2, 3, 4], [0, 2, 3, 4, 5], [1, 0, 2, 3, 4]]
>>> sorted(data, key=k(1,0))
[[1, 0, 2, 3, 4], [0, 1, 2, 3, 4], [0, 2, 3, 4, 5]]
>>> sorted(a, key=k(2,0))
[[0, 1, 2, 3, 4], [1, 0, 2, 3, 4], [0, 2, 3, 4, 5]]
mumrah
  • 361
  • 4
  • 3
0
from operator import itemgetter
from functools import partial

def _neg_itemgetter(key, d):
    return -d[key]

def key_getter(key_expr):
    keys = key_expr.split(",")
    getters = []
    for k in keys:
        k = k.strip()
        if k.startswith("-"):
           getters.append(partial(_neg_itemgetter, k[1:]))
        else:
           getters.append(itemgetter(k))

    def keyfunc(dct):
        return [kg(dct) for kg in getters]

    return keyfunc

def multikeysort(dict_list, sortkeys):
    return sorted(dict_list, key = key_getter(sortkeys)

Demonstration:

>>> multikeysort([{u'TOT_PTS_Misc': u'Utley, Alex', u'Total_Points': 60.0},
                 {u'TOT_PTS_Misc': u'Russo, Brandon', u'Total_Points': 96.0}, 
                 {u'TOT_PTS_Misc': u'Chappell, Justin', u'Total_Points': 96.0}],
                "-Total_Points,TOT_PTS_Misc")
[{u'Total_Points': 96.0, u'TOT_PTS_Misc': u'Chappell, Justin'}, 
 {u'Total_Points': 96.0, u'TOT_PTS_Misc': u'Russo, Brandon'}, 
 {u'Total_Points': 60.0, u'TOT_PTS_Misc': u'Utley, Alex'}]

The parsing is a bit fragile, but at least it allows for variable number of spaces between the keys.

Torsten Marek
  • 83,780
  • 21
  • 91
  • 98
  • But, when I have the second item in the string with a '-', it gives me a bad operand type for unary - error. – simi Jul 17 '09 at 16:57
  • You can't take the negative of a string. – Torsten Marek Jul 17 '09 at 17:14
  • Yes, I know, but this is how the parameters are passed in. Even if I do a split, one or the other will start with '-'. I think the sortkeys need to be split before calling key_getter, that way each item in the keys list will check the first character. Am I on the right track? – simi Jul 17 '09 at 17:26
0

Since you're already comfortable with lambda, here's a less verbose solution.

>>> def itemgetter(*names):
    return lambda mapping: tuple(-mapping[name[1:]] if name.startswith('-') else mapping[name] for name in names)

>>> itemgetter('a', '-b')({'a': 1, 'b': 2})
(1, -2)
A. Coady
  • 54,452
  • 8
  • 34
  • 40
  • This does not work. I have: values = ['-Total_Points', 'TOT_PTS_Misc'] then b as the list of dicts When I call g = itemgetter(values)(b) I get AttributeError: 'list' object has no attribute 'startswith' – simi Jul 17 '09 at 19:06
  • It takes a variable number of names, not a list of names. Call it like this: itemgetter(*values). Have a look at the similar builtin operator.itemgetter for another example. – A. Coady Jul 17 '09 at 19:14