4

How to sort an array in python firstly by the length of the words (longest to shortest), and then alphabetically?

Here is what I mean:

I have this list: WordsArray = ["Lorem", "ipsum", "dolor", "sit", "amet", "consectetur", "adipiscing", "elit", "sed", "do", "eiusmod", "tempor", "incididunt"]

I want to output this: ['consectetur', 'adipiscing', 'incididunt', 'eiusmod', 'tempor', 'dolor', 'ipsum', 'Lorem', 'amet', 'elit', 'sed', 'sit', 'do']

I can already sort alphabetically using print (sorted(WordsArray)):

['Lorem', 'adipiscing', 'amet', 'consectetur', 'do', 'dolor', 'eiusmod', 'elit', 'incididunt', 'ipsum', 'sed', 'sit', 'tempor']
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555

3 Answers3

6

Firstly, using just sorted will not sort alphabetically, look at your output... I am pretty sure L does not come before a. What you are currently doing is a case-sensitive sort.

You can perform a case-insensitive sort by using a Key Function like so:

>>> words_list = ["Lorem", "ipsum", "dolor", "sit", "amet", "consectetur", "adipiscing", "elit", "sed", "do", "eiusmod", "tempor", "incididunt"]
>>> sorted(words_list, key=str.lower)
['adipiscing', 'amet', 'consectetur', 'do', 'dolor', 'eiusmod', 'elit', 'incididunt', 'ipsum', 'Lorem', 'sed', 'sit', 'tempor']

You can then modify the Key Function like below to sort first on length then alphabetically:

>>> def custom_key(str):
...   return -len(str), str.lower()
... 
>>> sorted(words_list, key=custom_key)
['consectetur', 'adipiscing', 'incididunt', 'eiusmod', 'tempor', 'dolor', 'ipsum', 'Lorem', 'amet', 'elit', 'sed', 'sit', 'do']
Sash Sinha
  • 18,743
  • 3
  • 23
  • 40
  • Solution works, but have a doubt. With `lower`, the order comes in : `..'dolor', 'ipsum'..` , but without it, the order is `..'ipsum', 'dolor'..` How can `lower` make a difference there? – Kaushik NP Sep 29 '17 at 08:49
  • @KaushikNP Sorry I don't understand? if you look at OP's code `dolor` is before `ipsum` and it is also before in both of my examples? – Sash Sinha Sep 29 '17 at 08:51
  • As i said, Your solution works. Trying some variations on my own, I noticed that behaviour. – Kaushik NP Sep 29 '17 at 08:52
  • 1
    `>>> sorted(words_list, key=lambda x: (-len(x)))` gives `=> ['consectetur', 'adipiscing', 'incididunt', 'eiusmod', 'tempor', 'Lorem', 'ipsum', 'dolor', 'amet', 'elit', 'sit', 'sed', 'do']` – Kaushik NP Sep 29 '17 at 08:53
  • 2
    @KaushikNP You mean when you don't sort alphabetically, it doesn't get sorted alphabetically? Shocking. – Stefan Pochmann Sep 29 '17 at 09:09
4

You can use as key a tuple that specifies first the negative length of the string -len(x) and then x itself:

sorted(WordsArray, key=lambda x: (-len(x),x))

Since tuples are sorted by the first element and in case of ex aequo by the second element and so on, we thus first compare on the -len(x) of the two strings, so that means that the larger string is sorted first.

In case both strings have the same length, we compare on x, so alphabetically.

Mind that sorting two strings is case sensitive: Python sorts them lexicographically, but where the order is specified by the ord(..) of the first characters, etc. If you want to order alphabetically, you better convert upper case and lower case to the same case. A fast way to handle this is:

sorted(WordsArray, key=lambda x: (-len(x),x.lower()))

But this is not always correct: since for instance the est-zet in German is sometimes translate to ss, etc. In fact sorting alphabetically is a very hard problem in some languages. So in that case, you need to specify collation.

Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
  • Solution works, but have a doubt. With `lower`, the order comes in : `..'dolor', 'ipsum'..` , but without it, the order is `..'ipsum', 'dolor'..` How can `lower` make a difference there? – Kaushik NP Sep 29 '17 at 08:51
  • @KaushikNP: because not all characters have a `lower` in all cultures/languages. It is already a hard problem whether two strings are equivalent. In German for instance `'Foostraße'` en `'Foostrasse'` are frequently seen as the *same* text. See [here](https://stackoverflow.com/a/29247821/67579) for instance. – Willem Van Onsem Sep 29 '17 at 08:53
  • what you say are correct. I don't think you understand my problem though. `>>> sorted(words_list, key=lambda x: (-len(x)))` gives `=> ['consectetur', 'adipiscing', 'incididunt', 'eiusmod', 'tempor', 'Lorem', 'ipsum', 'dolor', 'amet', 'elit', 'sit', 'sed', 'do']` . Order should not be so for `ipsum` and `dolor` though. – Kaushik NP Sep 29 '17 at 08:55
  • @KaushikNP: yes, but that's why we map `x` to a 2-tuple: `(-len(x),x.lower())` (so in case the two `-len(x)`s are equal, Python will perform a comparison on the second element of the tuple `x.lower()`. – Willem Van Onsem Sep 29 '17 at 08:57
  • Oh, ok. Otherwise there is no comparision? Hmmm. Got it – Kaushik NP Sep 29 '17 at 09:55
0

For who in my case:

A = [a_12,a_3,a_11]

sorted(A, key=lambda x: (len(x),x))

[a_3, a_11, a_12]