10

I have the following Python array of dictionaries:

myarr = [ { 'name': 'Richard', 'rank': 1 },
{ 'name': 'Reuben', 'rank': 4 },
{ 'name': 'Reece', 'rank': 0 },
{ 'name': 'Rohan', 'rank': 3 },
{ 'name': 'Ralph', 'rank': 2 },
{ 'name': 'Raphael', 'rank': 0 },
{ 'name': 'Robin', 'rank': 0 } ]

I'd like to sort it by the rank values, ordering as follows: 1-2-3-4-0-0-0.

If I try:

sorted_master_list = sorted(myarr, key=itemgetter('rank'))

then the list is sorted in the order 0-0-0-1-2-3-4.

How can I define a custom comparator function to push zeroes to the bottom of the list? I'm wondering if I can use something like methodcaller.

Roman Bodnarchuk
  • 29,461
  • 12
  • 59
  • 75
Richard
  • 31,629
  • 29
  • 108
  • 145

8 Answers8

24

Option 1:

key=lambda d:(d['rank']==0, d['rank'])

Option 2:

key=lambda d:d['rank'] if d['rank']!=0 else float('inf')

Demo:

"I'd like to sort it by the rank values, ordering as follows: 1-2-3-4-0-0-0." --original poster

>>> sorted([0,0,0,1,2,3,4], key=lambda x:(x==0, x))
[1, 2, 3, 4, 0, 0]

>>> sorted([0,0,0,1,2,3,4], key=lambda x:x if x!=0 else float('inf'))
[1, 2, 3, 4, 0, 0]

 

Additional comments:

"Please could you explain to me (a Python novice) what it's doing? I can see that it's a lambda, which I know is an anonymous function: what's the bit in brackets?" – OP comment

Indexing/slice notation:

itemgetter('rank') is the same thing as lambda x: x['rank'] is the same thing as the function:

def getRank(myDict):
    return myDict['rank']

The [...] is called the indexing/slice notation, see Explain Python's slice notation - Also note that someArray[n] is common notation in many programming languages for indexing, but may not support slices of the form [start:end] or [start:end:step].

key= vs cmp= vs rich comparison:

As for what is going on, there are two common ways to specify how a sorting algorithm works: one is with a key function, and the other is with a cmp function (now deprecated in python, but a lot more versatile). While a cmp function allows you to arbitrarily specify how two elements should compare (input: a,b; output: a<b or a>b or a==b). Though legitimate, it gives us no major benefit (we'd have to duplicate code in an awkward manner), and a key function is more natural for your case. (See "object rich comparison" for how to implicitly define cmp= in an elegant but possibly-excessive way.)

Implementing your key function:

Unfortunately 0 is an element of the integers and thus has a natural ordering: 0 is normally < 1,2,3... Thus if we want to impose an extra rule, we need to sort the list at a "higher level". We do this by making the key a tuple: tuples are sorted first by their 1st element, then by their 2nd element. True will always be ordered after False, so all the Trues will be ordered after the Falses; they will then sort as normal: (True,1)<(True,2)<(True,3)<..., (False,1)<(False,2)<..., (False,*)<(True,*). The alternative (option 2), merely assigns rank-0 dictionaries a value of infinity, since that is guaranteed to be above any possible rank.

More general alternative - object rich comparison:

The even more general solution would be to create a class representing records, then implement __lt__, __gt__, __eq__, __ne__, __gt__, __ge__, and all the other rich comparison operators, or alternatively just implement one of those and __eq__ and use the @functools.total_ordering decorator. This will cause objects of that class to use the custom logic whenever you use comparison operators (e.g. x=Record(name='Joe', rank=12) y=Record(...) x<y); since the sorted(...) function uses < and other comparison operators by default in a comparison sort, this will make the behavior automatic when sorting, and in other instances where you use < and other comparison operators. This may or may not be excessive depending on your use case.

Cleaner alternative - don't overload 0 with semantics:

I should however point out that it's a bit artificial to put 0s behind 1,2,3,4,etc. Whether this is justified depends on whether rank=0 really means rank=0; if rank=0 are really "lower" than rank=1 (which in turn are really "lower" than rank=2...). If this is truly the case, then your method is perfectly fine. If this is not the case, then you might consider omitting the 'rank':... entry as opposed to setting 'rank':0. Then you could sort by Lev Levitsky's answer using 'rank' in d, or by:

Option 1 with different scheme:

key=lambda d: (not 'rank' in d, d['rank'])

Option 2 with different scheme:

key=lambda d: d.get('rank', float('inf'))

sidenote: Relying on the existence of infinity in python is almost borderline a hack, making any of the mentioned solutions (tuples, object comparison), Lev's filter-then-concatenate solution, and even maybe the slightly-more-complicated cmp solution (typed up by wilson), more generalizable to other languages.

Community
  • 1
  • 1
ninjagecko
  • 88,546
  • 24
  • 137
  • 145
  • Please could you explain to me (a Python novice) what it's doing? I can see that it's a lambda, which I know is an anonymous function: what's the bit in brackets? – Richard Apr 12 '12 at 18:38
  • @Richard: of course =) I'll be happy to explain in my answer so everyone can follow. – ninjagecko Apr 12 '12 at 18:40
1

I'd do

 sortedlist = sorted([x for x in myarr if x['rank']], key=lambda x: x['rank']) + [x for x in myarr if not x['rank']]

bit I guess it could be compressed somehow.

Lev Levitsky
  • 63,701
  • 20
  • 147
  • 175
1

I'm more leaning toward creating a compare function to handle the "0" specifically:

def compare(x,y):
    if x == y:
        return 0
    elif x == 0:
        return 1
    elif y == 0:
        return -1
    else:
        return cmp(x,y)

sorted(myarr, cmp=lambda x,y: compare(x,y), key=lambda x:x['rank'])

However, there are performance penalty on the custom compare function.

wilson
  • 177
  • 1
  • 7
  • This is the `cmp` solution. I like the combined use of `key=` and `cmp=` as an elegant exploitation of the way `sorted`'s parameters work in python. In English this says "`comparing elements left,right by rank: they are equal if their ranks are equal, otherwise left is larger if it's 0 or right is larger if it's 0, otherwise perform the default comparison`". Tragically the first two lines are necessary, otherwise the middle lines would return wrong values before the final `cmp`. The alternative would be removing the top two lines, and doing: `if x==0 and y!=0`...`elif y=0 and x!=0`...`else:`. – ninjagecko Apr 12 '12 at 19:26
  • 1
    It's worth noting that `cmp` goes away in Python 3. – senderle Apr 12 '12 at 20:34
-1

A hacky way to do it is:

sorted_master_list = sorted(myarr, key=lambda x: 99999 if x['rank'] == 0 else x['rank'])

This works fairly well if you know your maximum rank.

mensi
  • 9,580
  • 2
  • 34
  • 43
  • This won't work. `itemgetter()` returns a *function*. When you already use a lambda, just use `x['rank']` - you lose the performance advantage of using `itemgetter` anyway. – ThiefMaster Apr 12 '12 at 18:35
-1

You're myarr binding here doesn't look like valid Python code (and doesn't execute in my interpreter session.

Rendering that into:

myarr = {
    'Richard': 1,
    'Reuben': 4,
    'Reece': 0,
    'Rohan': 3,
    'Ralph': 2,
    'Raphael': 0,
    'Robin': 0 }

Gives me something on which I could base an answer.

The recommended way to do custom sorting in Python is to use the DSU (decorate, sort, undecorate) pattern. If you want to sort a dictionary by values then that looks something like:

keys_sorted_by_val = [ x[1] for x in sorted([(v,k) for k,v in myarr.items()])]

... where (v,k) for k,v in myarr.items() is the expression to decorate; sorted() is, obviously, the sort and the outer x[1] for x in ... is the final undecorate step.

Obviously this might seem like a sufficiently common requirement that one might which to wrap this up in a function:

def dict_by_values(d):
    return [ x[1] for x in sorted([(v,k) for k,v in d.items()])]

If you had a collection of object instances that you want to sort by some attribute you can use something like this:

def sort_by_attr(attr, coll):
    results = list()
    for each in coll:
        assert hasattr(each, attr)
        results.append((getattr(each, attr), each))
    results.sort()
    return [x[1] for x in results]

So if we created a class representing your name/rank data like this:

class NameRanking(object):
    def __init__(self, name, rank):
        self.name = name
        self.rank = rank
    def __repr__(self):
        return "%s: %s, %s" %(self.__class__, self.name, self.rank)

... and instantiate a list of those using myarr:

name_rankings = [ NameRanking(k, v) for k,v in myarr.items() ]

... then we could get a sorted copy of that using:

names_rankings_by_rank = sort_by_attr('rank', name_rankings)

(Yes the assert isn't a good idea here; that's where you would put in your own exception handling or throwing code as appropriate to your application).

Jim Dennis
  • 17,054
  • 13
  • 68
  • 116
-2

Just give pass to "key" an arbitrary function or callable object - it is what it takes. itemgetter happens to be one such function -- but it can work with any function you write - it just has to take a single parameter as input, and return an object that is directly compable to achieve the order you want.

In this case:

def key_func(item):
   return item["rank"] if item["rank"] != 0 else -100000

sorted_master_list = sorted(myarr, key=key_func)

(it can also be written as a lambda expression)

jsbueno
  • 99,910
  • 10
  • 151
  • 209
-3

You can use function in key param:

for ass sorting:

sorted_master_list = sorted(myarr, key=lambda x: x.get('rank'))

or to desc:

sorted_master_list = sorted(myarr, key=lambda x: -x.get('rank'))

Also you can read about sorted function here http://wiki.python.org/moin/HowTo/Sorting

Rustem
  • 2,884
  • 1
  • 17
  • 32
  • Because you suggested a stock-standard ascending or descending sort, and the OP wants something slightly different (normal sorting order EXCEPT with the zero-rank elements ordered differently.) You need to read and understand the original question before you post generic answers. – Li-aung Yip Apr 12 '12 at 18:52
-3

try sorted_master_list = sorted(myarr, key=itemgetter('rank'), reverse=True)

nay
  • 864
  • 2
  • 8
  • 12