3

Im trying to sort a list of objects based on frequency of occurrence (increasing order) of characters. Im seeing that the sort behaves differently if list has numbers versus characters. Does anyone know why this is happening?

Below is a list of numbers sorted by frequency of occurrence.

# Sort list of numbers based on increasing order of frequency
nums = [1,1,2,2,2,3]
countMap = collections.Counter(nums)
nums.sort(key = lambda x: countMap[x])
print(nums)

# Returns correct output
[3, 1, 1, 2, 2, 2]

But If I sort a list of characters, the order of 'l' and 'o' is incorrect in the below example:

# Sort list of characters based on increasing order of frequency
alp = ['l', 'o', 'v', 'e', 'l', 'e', 'e', 't', 'c', 'o', 'd', 'e']
countMap = collections.Counter(alp)
alp.sort(key = lambda x: countMap[x])
print(alp)

# Returns Below output - characters 'l' and 'o' are not in the correct sorted order
['v', 't', 'c', 'd', 'l', 'o', 'l', 'o', 'e', 'e', 'e', 'e']

# Expected output
['v', 't', 'c', 'd', 'l', 'l', 'o', 'o', 'e', 'e', 'e', 'e']
user1399344
  • 63
  • 1
  • 1
  • 4
  • 1
    Very nice question :) thanks for explaining and demonstrating what puzzled you – Patrick Artner Nov 12 '20 at 20:02
  • Can you clarify what you mean by "the correct sort order"? Both ``l`` and ``o`` appear twice, so the sort key establishes no preference on their relative order. – MisterMiyagi Nov 12 '20 at 20:25

1 Answers1

3

Sorting uses stable sort - that means if you have the same sorting criteria for two elements they keep their relative order/positioning (here it being the amount of 2 for both of them).

from collections import Counter
# Sort list of characters based on increasing order of frequency
alp = ['l', 'o', 'v', 'e', 'l', 'e', 'e', 't', 'c', 'o', 'd', 'e']
countMap = Counter(alp)
alp.sort(key = lambda x: (countMap[x], x))  # in a tie, the letter will be used to un-tie
print(alp)

['c', 'd', 't', 'v', 'l', 'l', 'o', 'o', 'e', 'e', 'e', 'e'] 

This fixes it by using the letter as second criteria.

To get your exact output you can use:

# use original position as tie-breaker in case counts are identical 
countMap = Counter(alp)
pos = {k:alp.index(k) for k in countMap}

alp.sort(key = lambda x: (countMap[x], pos[x]))
print(alp)

['v', 't', 'c', 'd', 'l', 'l', 'o', 'o', 'e', 'e', 'e', 'e']

See Is python's sorted() function guaranteed to be stable? or https://wiki.python.org/moin/HowTo/Sorting/ for details on sorting.

Patrick Artner
  • 50,409
  • 9
  • 43
  • 69
  • 1
    This works, but it seems the "expected" output uses the first index in the list as the 2nd check. The OP has `'v'` first, not `'c'`. I think you can fix this by doing something like: `alp_idx = alp.copy()` and then `key = lambda x: (countMap[x], alp_idx.index(x))` – gen_Eric Nov 12 '20 at 20:11
  • 1
    @Rocket good catch - I added a somewhat different positional check based on your suggestion and on the keys already captured by the Counter – Patrick Artner Nov 12 '20 at 20:17