6

How to sort a list to end up with:

['a', 'aa', 'aaa', 'A', 'AA', 'AAA', 'b', 'bb', 'bbb', 'B', 'BB', 'BBB']

Assume a shuffled version of it for convenience:

['bb', 'a', 'B', 'BB', 'AAA', 'BBB', 'b', 'aa', 'aaa', 'A', 'AA', 'bbb']

I tried sorting by ignoring case:

l = sorted(l, key=lambda x: x.lower())

which results in ['a', 'A', 'aa', 'AA', 'aaa', 'AAA']


From answers below, there are two solutions for the mixed case, I'm not sure which is better.

L = ['ABC1', 'abc1', 'ABC2', 'abc2', 'Abc']
l = sorted(L, key=lambda x: "".join([y.lower() + y.swapcase() for y in x]))
print(l)
l = sorted(L, key=lambda x: [(c.lower(), c.isupper()) for c in x])
print(l)
minion
  • 561
  • 4
  • 17

3 Answers3

9

You can use sorted() with a custom key function:

>>> L = ['bb', 'a', 'B', 'BB', 'AAA', 'BBB', 'b', 'aa', 'aaa', 'A', 'AA', 'bbb']
>>> sorted(L, key=lambda x: (x[0].lower(), x[0].isupper(), len(x)))
['a', 'aa', 'aaa', 'A', 'AA', 'AAA', 'b', 'bb', 'bbb', 'B', 'BB', 'BBB']

This works by comparing each element's first character lowercased first, then the element's case and finally its length.

P.S. To also handle mixed-case and mixed-character elements you'd need to compare tuples for individual characters, e.g.:

>>> L = ['ab', 'aA', 'bb', 'a', 'B', 'BB', 'b', 'aa', 'A', 'AA']
>>> sorted(L, key=lambda x: [(c.lower(), c.isupper()) for c in x])
['a', 'aa', 'aA', 'ab', 'A', 'AA', 'b', 'bb', 'B', 'BB']
Eugene Yarmash
  • 142,882
  • 41
  • 325
  • 378
  • Could you please explain more about sorting by the key of a 'tuple' method? Where is the official source? I don't see it from [sorted](https://docs.python.org/3.6/library/functions.html#sorted) or [HOWTO](https://docs.python.org/3.6/howto/sorting.html#sortinghowto) – minion Jun 15 '18 at 10:12
  • 1
    The key function returns a value that you would like to be the item's "key" for sorting, in this case a 3-element tuple. So instead of comparing the items directly, the tuples are compared. Tuples are compared position by position: the first items are compared; if they are the same then the second items are compared, and so on. See also https://stackoverflow.com/q/5292303/244297 – Eugene Yarmash Jun 15 '18 at 10:30
  • Is `len(x)` really necessary for the result? – minion Jun 15 '18 at 10:53
  • 2
    This would sort `ab` possibly before `aa` which seems weird. Would make more sense to have `x` somewhere in there too I'd think. – Voo Jun 15 '18 at 11:06
  • @minion: Yes. Try putting `'BBB'` before `'B'` in the input list. – Eugene Yarmash Jun 15 '18 at 11:27
  • The first approach assumes that there is no empty string in `L` (it's trivial to handle that corner case, though). – chi Jun 15 '18 at 12:31
6

TLDR

result = sorted(lst, key=lambda s: [(c.lower(), c.isupper()) for c in s])

You can transform each string to a list of tuples, one per character. A tuple for a character c takes a form (c.lower(), c.isupper()). The usual list comparison gives your desired sort.

lst = ["a", "aa", "aaa", "A", "AA", "AAA", "b", "bb", "bbb", "B", "BB", "BBB"]

lsts = [[(c.lower(), c.isupper()) for c in s] for s in lst]

# [[('a', False)],
# [('a', False), ('a', False)],
# [('a', False), ('a', False), ('a', False)],
# [('a', True)],
# [('a', True), ('a', True)],
# [('a', True), ('a', True), ('a', True)],
# [('b', False)],
# [('b', False), ('b', False)],
# [('b', False), ('b', False), ('b', False)],
# [('b', True)],
# [('b', True), ('b', True)],
# [('b', True), ('b', True), ('b', True)]]

res = ["".join(c.upper() if u else c for c, u in ls) for ls in lsts]

Recovering the result:

['a', 'aa', 'aaa', 'A', 'AA', 'AAA', 'b', 'bb', 'bbb', 'B', 'BB', 'BBB']

Note that there are many distinct ways to order mixed-case elements consistent with the OPs original example. This approach is the only reasonable sort that I can think of which arises from an anti-symmetric order relation. In particular, this sort admits no equivalent elements that are not equal.

For example, ['aAa', 'aaA'] and ['aaA', 'aAa'] will lead to the same output of ['aaA', 'aAa'].

hilberts_drinking_problem
  • 11,322
  • 3
  • 22
  • 51
  • I wasn't sure what the desired mixed case behavior was. Comparing lists seemed more reasonable. – hilberts_drinking_problem Jun 15 '18 at 09:46
  • 1
    What about the given ordering is anti-symmetric? I'm not entirely sure if the given sorting order is really an equivalence relation (if it isn't you just introduced a really fun bug), but symmetry seems the one requirement that on first glance seems fine. – Voo Jun 15 '18 at 11:20
  • @Voo I meant that sorting by this key has the property that `x <= y` and `y <= x` implies that `x == y`. The term "antisymmetric" is used in the same sense as in the usual definition of an equivalence relation, but of course the resulting relation `<=` does not have to be an equivalence relation. – hilberts_drinking_problem Jun 15 '18 at 11:39
  • I like antisymmetry here because it will lead to the same output for any permutation of an input list. Of course, this may not be useful depending on OPs intentions. – hilberts_drinking_problem Jun 15 '18 at 11:45
  • 1
    Ah yes I see what you mean. Yes it's a very useful property to have even if not strictly required! – Voo Jun 15 '18 at 12:02
2

Short answer :

sorted(l, key=lambda x: "".join([y.lower() + y.swapcase() for y in x]))

Each word is transformed by doubling each letter, first letter is the lower version of the letter, second letter is the swaped version. Second letter is swaped in order to have lowercase sorted before uppercase.

Olivier Cazade
  • 777
  • 5
  • 10