83

I have a set

set(['booklet', '4 sheets', '48 sheets', '12 sheets'])

After sorting I want it to look like

4 sheets,
12 sheets,
48 sheets,
booklet

Any idea please

SilentGhost
  • 307,395
  • 66
  • 306
  • 293
mmrs151
  • 3,924
  • 2
  • 34
  • 38

11 Answers11

144

Jeff Atwood talks about natural sort and gives an example of one way to do it in Python. Here is my variation on it:

import re 

def sorted_nicely( l ): 
    """ Sort the given iterable in the way that humans expect.""" 
    convert = lambda text: int(text) if text.isdigit() else text 
    alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ] 
    return sorted(l, key = alphanum_key)

Use like this:

s = set(['booklet', '4 sheets', '48 sheets', '12 sheets'])
for x in sorted_nicely(s):
    print(x)

Output:

4 sheets
12 sheets
48 sheets
booklet

One advantage of this method is that it doesn't just work when the strings are separated by spaces. It will also work for other separators such as the period in version numbers (for example 1.9.1 comes before 1.10.0).

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • Hi Jeff, Thank you very much. That is exactly what I was looking for. Good luck. – mmrs151 Apr 21 '10 at 13:58
  • 2
    Is it possible to modify this for a list of tuples based on the first value in the tuple? Example: `[('b', 0), ('0', 1), ('a', 2)]` is sorted to `[('0', 1), ('a', 2), ('b', 0)]` – paragbaxi Jul 27 '11 at 17:58
  • 3
    This function is case sensitive. Upper case strings will take precedence. To fix this add `.lower()` to `key` in `re.split`. – zamber Sep 16 '15 at 14:17
  • @paragbaxi Add `[0]` after `key` in the `alphanum_key` lambda function: ```lambda key: [ convert(c) for c in re.split('([0-9]+)', key[0]) ]``` – Justin Lillico Nov 04 '21 at 22:47
63

Short and sweet:

sorted(data, key=lambda item: (int(item.partition(' ')[0])
                               if item[0].isdigit() else float('inf'), item))

This version:

  • Works in Python 2 and Python 3, because:
    • It does not assume you compare strings and integers (which won't work in Python 3)
    • It doesn't use the cmp parameter to sorted (which doesn't exist in Python 3)
  • Will sort on the string part if the quantities are equal

If you want printed output exactly as described in your example, then:

data = set(['booklet', '4 sheets', '48 sheets', '12 sheets'])
r = sorted(data, key=lambda item: (int(item.partition(' ')[0])
                                   if item[0].isdigit() else float('inf'), item))
print ',\n'.join(r)
Daniel Stutzbach
  • 74,198
  • 17
  • 88
  • 77
  • chokes on `4a sheets` but who cares? to fix this you'd need a real function instead of a lambda. – Jean-François Fabre Feb 06 '19 at 20:03
  • 1
    That might work for this trivial example but not for instance a list like ["1. bla", "2. blub"]. Probably the split should be a regex instead, and also sort by the second part afterwards, so ["1 bcd", "2 abc", "1 xyz"] comes out correctly. – FrankyBoy Feb 14 '20 at 09:26
  • Unfortunately, @FrankyBoy is correct, this does not work for sorting sets of version numbers alphanumerically; e.g., v1.0.1, v3.5.3, v3.2.4 – Mdev Apr 29 '21 at 19:52
24

You should check out the third party library natsort. Its algorithm is general so it will work for most input.

>>> import natsort
>>> your_list = set(['booklet', '4 sheets', '48 sheets', '12 sheets'])
>>> print ',\n'.join(natsort.natsorted(your_list))
4 sheets,
12 sheets,
48 sheets,
booklet
SethMMorton
  • 45,752
  • 12
  • 65
  • 86
10

A simple way is to split up the strings to numeric parts and non-numeric parts and use the python tuple sort order to sort the strings.

import re
tokenize = re.compile(r'(\d+)|(\D+)').findall
def natural_sortkey(string):          
    return tuple(int(num) if num else alpha for num, alpha in tokenize(string))

sorted(my_set, key=natural_sortkey)
Ants Aasma
  • 53,288
  • 15
  • 90
  • 97
8

It was suggested that I repost this answer over here since it works nicely for this case also

from itertools import groupby
def keyfunc(s):
    return [int(''.join(g)) if k else ''.join(g) for k, g in groupby(s, str.isdigit)]

sorted(my_list, key=keyfunc)

Demo:

>>> my_set = {'booklet', '4 sheets', '48 sheets', '12 sheets'}
>>> sorted(my_set, key=keyfunc)
['4 sheets', '12 sheets', '48 sheets', 'booklet']

For Python3 it's necessary to modify it slightly (this version works ok in Python2 too)

def keyfunc(s):
    return [int(''.join(g)) if k else ''.join(g) for k, g in groupby('\0'+s, str.isdigit)]
Community
  • 1
  • 1
John La Rooy
  • 295,403
  • 53
  • 369
  • 502
4

Generic answer to sort any numbers in any position in an array of strings. Works with Python 2 & 3.

def alphaNumOrder(string):
   """ Returns all numbers on 5 digits to let sort the string with numeric order.
   Ex: alphaNumOrder("a6b12.125")  ==> "a00006b00012.00125"
   """
   return ''.join([format(int(x), '05d') if x.isdigit()
                   else x for x in re.split(r'(\d+)', string)])

Sample:

s = ['a10b20','a10b1','a3','b1b1','a06b03','a6b2','a6b2c10','a6b2c5']
s.sort(key=alphaNumOrder)
s ===> ['a3', 'a6b2', 'a6b2c5', 'a6b2c10', 'a06b03', 'a10b1', 'a10b20', 'b1b1']

Part of the answer is from there

Community
  • 1
  • 1
Le Droid
  • 4,534
  • 3
  • 37
  • 32
2
>>> a = set(['booklet', '4 sheets', '48 sheets', '12 sheets'])
>>> def ke(s):
    i, sp, _ = s.partition(' ')
    if i.isnumeric():
        return int(i)
    return float('inf')

>>> sorted(a, key=ke)
['4 sheets', '12 sheets', '48 sheets', 'booklet']
SilentGhost
  • 307,395
  • 66
  • 306
  • 293
1

Based on SilentGhost's answer:

In [4]: a = set(['booklet', '4 sheets', '48 sheets', '12 sheets'])

In [5]: def f(x):
   ...:     num = x.split(None, 1)[0]
   ...:     if num.isdigit():
   ...:         return int(num)
   ...:     return x
   ...: 

In [6]: sorted(a, key=f)
Out[6]: ['4 sheets', '12 sheets', '48 sheets', 'booklet']
draebek
  • 197
  • 1
  • 7
0

sets are inherently un-ordered. You'll need to create a list with the same content and sort that.

Rakis
  • 7,779
  • 24
  • 25
  • 5
    Not true - the sorted() built-in will take any sequence and return a sorted list. – PaulMcG Apr 19 '10 at 18:00
  • 5
    So instead of creating a list and sorting it, you instead use a builtin to create a sorted list.... Yeah, I was way off. – Rakis Apr 19 '10 at 20:50
  • sets implemented a SortedSets (and not HashSets) are inherently *ordered* – axwell Jun 25 '20 at 15:59
0
b = set(['booklet', '10-b40', 'z94 boots', '4 sheets', '48 sheets',
         '12 sheets', '1 thing', '4a sheets', '4b sheets', '2temptations'])

numList = sorted([x for x in b if x.split(' ')[0].isdigit()],
                 key=lambda x: int(x.split(' ')[0]))

alphaList = sorted([x for x in b if not x.split(' ')[0].isdigit()])

sortedList = numList + alphaList

print(sortedList)

Out: ['1 thing',
      '4 sheets',
      '12 sheets',
      '48 sheets',
      '10-b40',
      '2temptations',
      '4a sheets',
      '4b sheets',
      'booklet',
      'z94 boots']
tldr
  • 116
  • 3
-1

For people stuck with a pre-2.4 version of Python, without the wonderful sorted() function, a quick way to sort sets is:

l = list(yourSet)
l.sort() 

This does not answer the specific question above (12 sheets will come before 4 sheets), but it might be useful to people coming from Google.

Giacomo Lacava
  • 1,784
  • 13
  • 25