169

I have a list of strings containing numbers and I cannot find a good way to sort them.
For example I get something like this:

something1
something12
something17
something2
something25
something29

with the sort() method.

I know that I probably need to extract the numbers somehow and then sort the list but I have no idea how to do it in the most simple way.

Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
Michal
  • 6,411
  • 6
  • 32
  • 45
  • what is the issue with sort()? – tMC May 11 '11 at 16:24
  • 7
    This has a name, Natural Sort. See http://stackoverflow.com/questions/2545532/python-analog-of-natsort-function-sort-a-list-using-a-natural-order-algorithm and http://stackoverflow.com/questions/4836710/does-python-have-a-built-in-function-for-string-natural-sort and probably others. – Mark Ransom May 11 '11 at 16:24
  • 5
    Why not just`list_name.sort(key= lambda x: float(x.strip('something')))` ? – altroware Jun 26 '17 at 13:40

1 Answers1

347

Perhaps you are looking for human sorting (also known as natural sorting):

import re

def atoi(text):
    return int(text) if text.isdigit() else text

def natural_keys(text):
    '''
    alist.sort(key=natural_keys) sorts in human order
    http://nedbatchelder.com/blog/200712/human_sorting.html
    (See Toothy's implementation in the comments)
    '''
    return [ atoi(c) for c in re.split(r'(\d+)', text) ]

alist=[
    "something1",
    "something12",
    "something17",
    "something2",
    "something25",
    "something29"]

alist.sort(key=natural_keys)
print(alist)

yields

['something1', 'something2', 'something12', 'something17', 'something25', 'something29']

PS. I've changed my answer to use Toothy's implementation of natural sorting (posted in the comments here) since it is significantly faster than my original answer.


If you wish to sort text with floats, then you'll need to change the regex from one that matches ints (i.e. (\d+)) to a regex that matches floats:

import re

def atof(text):
    try:
        retval = float(text)
    except ValueError:
        retval = text
    return retval

def natural_keys(text):
    '''
    alist.sort(key=natural_keys) sorts in human order
    http://nedbatchelder.com/blog/200712/human_sorting.html
    (See Toothy's implementation in the comments)
    float regex comes from https://stackoverflow.com/a/12643073/190597
    '''
    return [ atof(c) for c in re.split(r'[+-]?([0-9]+(?:[.][0-9]*)?|[.][0-9]+)', text) ]

alist=[
    "something1",
    "something2",
    "something1.0",
    "something1.25",
    "something1.105"]

alist.sort(key=natural_keys)
print(alist)

yields

['something1', 'something1.0', 'something1.105', 'something1.25', 'something2']
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • I could sort a list of objects which had a sub property (string) using the above as well. Just replace "text" with like "someobject", and then `return [ atoi(c) for c in re.split('(\d+)', someobject.sometextproperty) ]`. – Jonny Aug 21 '15 at 13:48
  • 2
    Do you know how to extend this to the case where the numbers are floats? For example, something1.0, something 1.25, something2.0. – painfulenglish May 02 '17 at 10:50
  • 4
    @painfulenglish: I've modified the post above to show how to natural sort text with floats. – unutbu May 02 '17 at 19:02
  • To fix pylint warning W1401: Anomalous backslash in string, simply prefix the regex with an 'r' like so: re.split(r'(\d+)', text) – Dylan Hogg Jan 31 '19 at 04:14
  • I've used the about code to sort my list but any ideas why I can't first remove duplicate entries ie: attr1 = set(all_names) attr1.sort(key=natural_keys) – 2one Sep 19 '19 at 10:47
  • `attr1 = set(...)` makes `attr1` a `set`. Sets don't have a `sort` method, so `attr1.sort` should raise an `AttributeError`. Try instead `attr1 = list(set(all_names))`, since `list`s do have a `sort` method. – unutbu Sep 19 '19 at 16:57
  • This only works with 1 number. Any easy extension of this to the case of multiple separate numbers? E.g. with the above implementation we 'unnaturally' I get results like `['sth1but10', 'sth1but2']` rather than `[['sth1but2','sth1but10']`. – FlorianH Apr 13 '20 at 15:56
  • The floating point sort does not work with cases such as this: alist=[ "something1", "something2", "something1.10", "something1.0", "something1.25", "something1.16", "something1.7", "something1.105"] Output: ['something1', 'something1.0', 'something1.10', 'something1.105', 'something1.16', 'something1.25', 'something1.7', 'something2'] – Derek May 04 '21 at 07:34
  • You also coulde use `natsorted` function from the `https://github.com/SethMMorton/natsort` – Will Dec 06 '22 at 09:55