3

I have a list of filenames :

a = ['data_1-0.hamster.raw',
     'data_0-0.hamster.raw',
     'data_9-1.hamster.raw',
     'data_2-0.hamster.raw',
     'data_0-1.hamster.raw',
     'data_0-10.hamster.raw',
     'data_0-2.hamster.raw']

And I want to sort this list such that I have this output:

a = ['data_0-0.hamster.raw',
     'data_0-1.hamster.raw',
     'data_0-2.hamster.raw',
     'data_0-10.hamster.raw',
     'data_1-0.hamster.raw',
     'data_2-0.hamster.raw',
     'data_9-1.hamster.raw']

This is the code that I made :

sorted(a, key=lambda f: int(re.search(r'-(\d+)[^-]*$', f).group(1)))

But I got a bit confused with re syntax and this is what I get:

a = ['data_1-0.hamster.raw',
     'data_0-0.hamster.raw',
     'data_2-0.hamster.raw',
     'data_9-1.hamster.raw',
     'data_0-1.hamster.raw',
     'data_0-2.hamster.raw',
     'data_0-10.hamster.raw']

It seems that it does the job for the number after the hyphen but not for the first number.

Braiam
  • 1
  • 11
  • 47
  • 78
Gaelle Sou
  • 77
  • 1
  • 7

5 Answers5

2

You could do the following:

import re


pattern = re.compile('data_(\d+)-(\d+)')

a = ['data_1-0.hamster.raw',
     'data_0-0.hamster.raw',
     'data_9-1.hamster.raw',
     'data_2-0.hamster.raw',
     'data_0-1.hamster.raw',
     'data_0-10.hamster.raw',
     'data_0-2.hamster.raw']

result = sorted(a, key=lambda s: tuple(map(int, pattern.search(s).groups())))
print(result)

Output

['data_0-0.hamster.raw', 'data_0-1.hamster.raw', 'data_0-2.hamster.raw', 'data_0-10.hamster.raw', 'data_1-0.hamster.raw', 'data_2-0.hamster.raw', 'data_9-1.hamster.raw']
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
1

This sortkey should probably be written as a regular function.

import re

def sortkey(string):
    numbering = re.search('\d+-\d+', string).group()
    first, second = map(int, numbering.split('-'))
    return first, second

Demo:

>>> a = ['data_1-0.hamster.raw',
...:     'data_0-0.hamster.raw',
...:     'data_9-1.hamster.raw',
...:     'data_2-0.hamster.raw',
...:     'data_0-1.hamster.raw',
...:     'data_0-10.hamster.raw',
...:     'data_0-2.hamster.raw']
...:     
>>> sorted(a, key=sortkey)
>>> 
['data_0-0.hamster.raw',
 'data_0-1.hamster.raw',
 'data_0-2.hamster.raw',
 'data_0-10.hamster.raw',
 'data_1-0.hamster.raw',
 'data_2-0.hamster.raw',
 'data_9-1.hamster.raw']
timgeb
  • 76,762
  • 20
  • 123
  • 145
0

use .sort() function

 a = ['data_1-0.hamster.raw',
 'data_0-0.hamster.raw',
 'data_9-1.hamster.raw',
 'data_2-0.hamster.raw',
 'data_0-1.hamster.raw',
 'data_0-10.hamster.raw',
 'data_0-2.hamster.raw'] 

 a.sort()
Sari Masri
  • 206
  • 1
  • 10
0

Just use the sort function ?

Starting with Python 2.4, both list.sort() and sorted() added a key parameter to specify a function to be called on each list element prior to making comparisons.

So you have: https://repl.it/@skapin/NormalTrustworthyJumpthreading

a = ['data_1-0.hamster.raw',
    'data_0-0.hamster.raw',
    'data_9-1.hamster.raw',
    'data_2-0.hamster.raw',
    'data_0-1.hamster.raw',
    'data_0-10.hamster.raw',
    'data_0-2.hamster.raw']


def by_id(item):
  return item.split('_')[1].split('.')[0]

a.sort(key=by_id)
print(a)
Skapin
  • 318
  • 2
  • 11
0

From this answer on sorting by multiple attributes:

A key can be a function that returns a tuple.

We can simplify your RegEx and convert the output to a tuple with:

sorted(a,key = lambda f: [int(i) for i in tuple(re.findall(r'\d+',f))])

dmitriys
  • 307
  • 4
  • 16