Sort string by character frequency, breaking ties alphabetically

Question

I want to solve this problem:

Given an input string, sort the characters in decreasing order of frequency. If more than one characters had same frequency count, sort them in increasing lexicographical order. Example:

bdca --> abcd,
bdcda -> ddabc,
abba -> aabb,
bacbdc -> bbccad,

My solution involves creating the frequencies in a hash map, sorting the hash map dict items by frequency using sorted() and lambda function. Then for the items with the same frequency (I need to write a subroutine for this), I do another sorted with lambda function.

def string_sort(s):
    hmap = {}
    for char in s:
        if char not in hmap:
            hmap[char] = 1
        else:
            hmap[char] += 1
    freqs = sorted(hmap.items(), key=lambda x: x[1], reverse=True)
    num_occur: list = find_num_occur(freqs)
    sorted_freqs = []
    start_ind = 0
    for f in num_occur:
        tmp_freqs = sorted(freqs[start_ind : start_ind + f], key=lambda x: x[0])
        sorted_freqs.extend(tmp_freqs)
        start_ind = len(sorted_freqs)
    out = []
    for item in sorted_freqs:
        out.extend([item[0]] * item[1])
    return "".join(out)


def find_num_occur(freqs):
    count = 1
    out = []
    for i in range(len(freqs) - 1):
        if freqs[i][1] == freqs[i + 1][1]:
            count += 1
        else:
            out.append(count)
            count = 1
    out.append(count)
    return out

The solution is not elegant. I was told I can solve it easier if using comparators, but I don't know how to use comparators in python. Any suggestions? Or any other more elegant solution?

Thanks.

Does this answer your question? [Using a comparator function to sort](https://stackoverflow.com/questions/12749398/using-a-comparator-function-to-sort) — Serial Lazer, Nov 09 '20 at 08:24
About vocabulary: "lexicographical order" doesn't make sense for a single character. You probably meant "alphabetical order". "Lexicographical order" is about comparing sequences. — Stef, Nov 09 '20 at 12:01

juanpa.arrivillaga · Answer 1 · 2020-11-09T08:33:46.970

4

You don't need to use comparators, you can use a key-function just fine. There's a lot of unnecessary complexity in your solution, all you need is something to the effect of:

>>> from collections import Counter
>>> def transmogrify(s):
...     counts = Counter(s)
...     return ''.join(sorted(s, key=lambda c: (-counts[c], c)))
...
>>> transmogrify('bdca')
'abcd'
>>> transmogrify('bdcda')
'ddabc'
>>> transmogrify('abba')
'aabb'
>>> transmogrify('bacbdc')
'bbccad'

Note, a collections.Counter is just a dict that is specialized for counting, since it is a common enough pattern.

>>> Counter('bacbdc')
Counter({'b': 2, 'c': 2, 'a': 1, 'd': 1})

edited Nov 09 '20 at 08:33

answered Nov 09 '20 at 08:28

juanpa.arrivillaga

88,713
10
131
172

Alternatively you could use `Counter.most_common` to re-expand into the originals, since it's going to return the most common element first e.g.`''.join(l*n for l, n in Counter(s).most_common())` – Masklinn Nov 09 '20 at 08:47
No, you can't use just the count, consider `'ccbb'`. With what you have, it will simply return `'ccbb'` but it should return `'bbcc'` – juanpa.arrivillaga Nov 09 '20 at 08:48
Aye you're right I missed the lexical ordering fallback. – Masklinn Nov 09 '20 at 08:49
Thanks for the responses. @juanpa.arrivillaga I guess the problem is that I don't really understand the key function you provided here: `key=lambda c: (-counts[c], c)`. Can you please explain what it means to have these two arguments (-counts[c], c)? Thanks. – Martin Nov 09 '20 at 09:30
1

@Martin those aren't two arguments. That is a tuple, the first item in the tuple, contains the count of that character (well, negative, because you want to sort in decreasing order and by default, `sorted` sorts in increasing order), the second element contains the character. Tuples are sorted lexicographically. So first it compares the first element of the tuple, and if there is a tie, it comparees the second element (and so on, and so forth, for example `(1, 'b') < (1, 'c')` just like `'ab' < 'ac'`) – juanpa.arrivillaga Nov 09 '20 at 09:36

ZhaoYi · Answer 2 · 2020-11-09T12:41:31.033

Counter & lt Method

I think your task can be divided into 2 tasks:

count task, this could be implemented easily by Counter class, but Counter cannot guarantee the order of its keys, so you need task 2
sort task (comparators like CPP), you need a custom sort func, so you can create a new class and implement the __lt__ method, doc

from collections import Counter

class Item:
    def __init__(self, ch, times):
        self.ch = ch
        self.times = times
    def __lt__(self, other):
        if self.times > other.times:
            return True
        if self.times == other.times:
            return self.ch < other.ch
        return False
# my restruct 
def restruct(inp):
    c = Counter(inp)
    data = sorted([Item(ch, times) for ch, times in c.items()])
    return ''.join([item.ch*item.times for item in data])

Following restruct a very elegant implement (with the same ability) by @juanpa.arrivillaga, thank him.

def restruct(inp):
    c = Counter(inp)
    return ''.join(sorted(inp, key=lambda ch: Item(ch, c[ch])))

Then try restruct('bacbdc') get bbccad, bingo!!!

You can just use `sorted(inp, key=lambda ch: Item(ch, c[ch]))` — juanpa.arrivillaga, Nov 09 '20 at 12:13
Cool !!! Very elegant, now I know the `key` arg of `sorted` func, thanks, :)! — ZhaoYi, Nov 09 '20 at 12:30

Sort string by character frequency, breaking ties alphabetically

2 Answers2

Counter & __lt__ Method

Counter & lt Method