How to use Python3 sort key if the compare function requires values of items being compared?

Question

Below is an example, where I've used a comparator function that explicitly requires both the items being present (x+y < y+x) to provide a comparison. The question is how do I write the below without using cmp_to_key function, since key takes only one input.

The following program is a solution to the problem:

Given a list of non-negative integers, arrange them such that they form the largest number.

For example, given [3, 30, 34, 5, 9], the largest formed number is 9534330.

from functors import cmp_to_key

def largestNumber(self, nums):
    numStr = [str(i) for i in nums]

    def str_cmp(x, y):
        if y+x < x+y: return -1
        elif y+x > x+y: return 1
        else: return 0

    numStr.sort(key=cmp_to_key(str_cmp))

    return "".join(numStr).lstrip('0') or '0'

What's the reason you don't want to use `cmp_to_key`? (E.g. would you accept a solution that did the same thing by hand?) — user94559, Jun 21 '17 at 21:48
Related: [How does Python's cmp_to_key function work?](https://stackoverflow.com/a/16362818/674039) — wim, Jun 21 '17 at 22:32

MSeifert · Accepted Answer · 2017-06-21T22:24:25.990

3

You could write a custom class that implements __lt__ (the method that implements < comparisons) in a way that you need it:

class Comp(object):
    def __init__(self, value):
        self.value = str(value)

    def __lt__(self, other):
        return other.value + self.value <= self.value + other.value

That should produce an identical "sorting":

>>> sorted([3, 30, 34, 5, 9], key=Comp)
[9, 5, 34, 3, 30]

But I'm not sure if that really does provide a "total ordering" (it could, I just have some doubts) and if it doesn't it could actually produce unexpected results (in any Python version independent of key or cmp argument).

edited Jun 21 '17 at 22:24

answered Jun 21 '17 at 21:58

MSeifert

145,886
38
333
352

2

This is basically rewriting `cmp_to_key`. May as well just use it. – wim Jun 21 '17 at 22:32
@wim The question actually just asked for an approach that didn't require `cmp_to_key`. I had an alternative but that - unfortunatly - failed. Still it should be better than the `cmp_to_key` because it doesn't require casting everything to strings before doing the sort and it could be faster because it only requires 2 string concatenations not 4 (worst-case in the original question). But yeah, basically you're right, it's just re-writing `cmp_to_key`... – MSeifert Jun 21 '17 at 22:48

user94559 · Answer 2 · 2017-06-21T22:43:51.687

Just for another way of doing this:

from itertools import cycle, islice

def extender(n):
    def extend(x):
        s = str(x)
        return "".join(islice(cycle(s), n))
    return extend

def biggest_number(input):
    if (len(input) == 0):
        return 0
    extend = extender(len(str(max(input))) * 2)
    s = sorted(input, key=extend, reverse=True)
    return int("".join(map(str, s)))

Essentially, you take each element of the array and make them the same length by repeating as necessary. Then you do a lexicographical sort. (A numerical sort would be identical at this point, but we want strings when we're done.)

E.g., for [3, 30, 34, 5, 9], we find that the longest number is 2 digits, so we extend everything to be three digits by repeating the digits as needed. These are the keys that get used:

[333, 303, 343, 555, 999]

Then we sort, descending, and assemble the result:

[9, 5, 34, 3, 30]
9534330

The intuition comes from starting with "Pick the number with the biggest leading digit." The issue arises of what we should do with a tie. E.g., why should we pick 3 before 30? The answer is that the biggest digit that could appear after the 3 is another 3. (If there were a bigger digit available, we would have picked it already.) So thinking of the 3 as "333333..." helps us pick the right one. Similar question: why would we pick 10 instead of 100? This leads us to realize that the best result after 10 is another number that starts with 10. (11 or more we would have picked already.) So think of it as "10101010..." and "100100100100...". It turns out, you just need to extend to n*2 digits, where n is the length of the longest number.

I realize that's a bit confusing. I wrote a test to make sure that's all correct. (It compares against your original code.)

from functools import cmp_to_key
import random

def largestNumber(nums):
    numStr = [str(i) for i in nums]

    def str_cmp(x, y):
        if y+x < x+y: return -1
        elif y+x > x+y: return 1
        else: return 0

    numStr.sort(key=cmp_to_key(str_cmp))

    return "".join(numStr).lstrip('0') or '0'

for i in range(1000000):
    input = [random.randint(0, 1000) for _ in range(random.randint(0, 100))]
    if biggest_number(input) != int(largestNumber(input)):
        print("FAILED: {}".format(input))
    if i % 100 == 0:
        print(i)

I have yet to find input that doesn't work. I'm fairly convinced this code is correct.

All of that said, I don't know why you don't want to just use cmp_to_key. :-)

I think I found a failing input, actually. Hang on while I investigate. (This code appears not to be correct.) — user94559, Jun 21 '17 at 22:35
Edited the code to use `n*2` instead of `n+1`. I believe it's now correct. — user94559, Jun 21 '17 at 22:44
If you have a failing case that would be really interesting, partly because I'm not sure if the original code actually provided a "total ordering". — MSeifert, Jun 21 '17 at 22:52
I've lost my failing case now that I fixed the code, but it was something like `1121` and `112`... basically, you need to extend further out to see the difference. (`11211 == 11211`, but `11211121 < 11211211`) I now believe that making everything twice the length of the maximum input number is right. — user94559, Jun 21 '17 at 22:58

How to use Python3 sort key if the compare function requires values of items being compared?

2 Answers2