0

I'm currently trying to sort a list of the form:

[["Chr1", "949699", "949700"],["Chr11", "3219", "444949"],
["Chr10", "699", "800"],["Chr2", "232342", "235345234"],
["ChrX", "4567", "45634"],["Chr1", "950000", "960000"]]

Using the built-in sorted(), I get:

[['Chr1', '949699', '949700'], ['Chr1', '950000', '960000'], ['Chr10', '699', '800'], ['Chr11', '3219', '444949'], ['Chr2', '232342', '235345234'], ['ChrX', '4567', '45634']]

but I want "Chr2" to come before "Chr10". My current solution involves some code adapted from the page: Does Python have a built in function for string natural sort?

My current solution looks like this:

import re

def naturalSort(l): 
    convert= lambda text: int(text) if text.isdigit() else text.lower() 
    alphanum_key= lambda key: [convert(c) for c in re.split('([0-9]+)', key)] 
    if isinstance(l[0], list):
        return sorted(l, key= lambda k: [alphanum_key(x) for x in k])
    else:
        return sorted(l, key= alphanum_key)

Yielding the correct order:

[['Chr1', '949699', '949700'], ['Chr1', '950000', '960000'], ['Chr2', '232342', '235345234'], ['Chr10', '699', '800'], ['Chr11', '3219', '444949'], ['ChrX', '4567', '45634']]

Is there a better way to do this?

Community
  • 1
  • 1
Megatron
  • 15,909
  • 12
  • 89
  • 97

2 Answers2

0

Did it like:

In [1]: l = [["Chr1", "949699", "949700"],["Chr11", "3219", "444949"],["Chr10", "699", "800"],["Chr2", "232342", "235345234"],["ChrX", "4567", "45634"],["Chr1", "950000", "960000"]]

In [2]: sorted(l, key=lambda x: int(x[0].replace('Chr', '')) if x[0].replace('Chr', '').isdigit() else x[0])
Out[2]: 
[['Chr1', '949699', '949700'],
 ['Chr1', '950000', '960000'],
 ['Chr2', '232342', '235345234'],
 ['Chr10', '699', '800'],
 ['Chr11', '3219', '444949'],
 ['ChrX', '4567', '45634']]

Or more elegant variant:

sorted(l, key=lambda x: int(''.join([i for i in x[0] if i.isdigit()])) if re.findall(r'\d+$', x[0]) else x[0])
greg
  • 1,417
  • 9
  • 28
  • The input is not always of this form. Sometimes it can also be just "1", "2", "11", "X" without the "Chr" prefix. – Megatron Nov 28 '13 at 16:13
  • Changed the sortering like `sorted(l, key=lambda x: int(''.join([i for i in x[0] if i.isdigit()])) if [i for i in x[0] if i.isdigit()] else x[0])` – greg Nov 28 '13 at 16:24
  • More interesting variant: `import re; sorted(l, key=lambda x: int(''.join([i for i in x[0] if i.isdigit()])) if re.findall(r'\d+$', x[0]) else x[0])` – greg Nov 28 '13 at 16:30
0

Here's a more compact solution:

natkey = lambda e: [x or int(y) for x, y in re.findall(r'(\D+)|(\d+)', e)]
print sorted(data, key=lambda item: map(natkey, item))
georg
  • 211,518
  • 52
  • 313
  • 390