1

I have a list of paths, which I have simplified into similar but simpler strings here:

paths = ['apple10/banana2/carrot1', 'apple10/banana1/carrot2', 'apple2/banana1', 'apple2/banana2', 'apple1/banana1', 'apple1/banana2', 'apple10/banana1/carrot1']

These paths need sorting in the order of the numbers. Ths first number (apple) is the most important in the search, followed by the second.

One added complication which may be clear is some of the paths will have a 3rd directory the data are within while others do not.

The MWE of the path structure looks as below:

parent 
|-----apple1 
          |------banana1 
                   |----- data*
          |------banana2 
                   |----- data*
|-----apple2
          |------banana1 
                   |----- data*
          |------banana2 
                   |----- data*
|-----apple10
          |------banana1 
                   |-----carrot1
                            |-----data*
                   |-----carrot2
                            |-----data*
          |------banana2 
                   |----- carrot1
                             |-----data*

The desired output is:

paths = ['apple1/banana1', 'apple1/banana2', 'apple2/banana1', 'apple2/banana2', 'apple10/banana1/carrot1', 'apple10/banana1/carrot2','apple10/banana2/carrot1']

I'm struggling to work out how to do this. sort will not work especially as the numbers will go into double digits and 10 would come before 2.

I have seen another answer which works with single numbers in a list of strings. How to correctly sort a string with a number inside? I've failed to adapt this to my problem.

Sunderam Dubey
  • 1
  • 11
  • 20
  • 40
Allentro
  • 406
  • 2
  • 13

4 Answers4

3

Try with sorted, supplying a custom key that uses re to extract all numbers from the path:

import re

>>> sorted(paths, key=lambda x: list(map(int,re.findall("(\d+)", x))))
['apple1/banana1',
 'apple1/banana2',
 'apple2/banana1',
 'apple2/banana2',
 'apple10/banana1/carrot1',
 'apple10/banana1/carrot2',
 'apple10/banana2/carrot1']
not_speshal
  • 22,093
  • 2
  • 15
  • 30
1

Addition to @not_speshal's answer:

Based on the answer from the question, you have provided, if your first word in path is not necessarily "apple", you can do something like this:

import re

def atoi(text):
    return int(text) if text.isdigit() else text

def word_and_num_as_tuple(text):
    return tuple( atoi(c) for c in re.split(r'(\d+)', text) )

def path_as_sortable_tuple(path, sep='/'):
    return tuple( word_and_num_as_tuple(word_in_path) for word_in_path in path.split(sep) )

paths = [
    'apple10/banana2/carrot1',
    'apple10/banana1/carrot2',
    'apple2/banana1',
    'apple2/banana2',
    'apple1/banana1',
    'apple1/banana2',
    'apple10/banana1/carrot1'
]


paths.sort(key=path_as_sortable_tuple)
print(paths)

# And, of course, as a lambda one-liner:
paths.sort( key= lambda path: tuple( tuple( int(char_seq) if char_seq.isdigit() else char_seq for char_seq in re.split(r'(\d+)', subpath) ) for subpath in path.split('/') ) )

It does exactly what @MarcinCuprjak suggested, but automatically

  • Ah, beat me to it and even better. You don't need the split on / right? – perreal Apr 26 '22 at 20:51
  • @perreal Nice point. But, i guess, without `split` it will only work if path ends with number. However, for paths like `apple1banana/orange11` result of `re.split(...)` will be `["apple", "1", "banana/orange", "11"]`, which may be counterintuitive and lead to incorrect sorts. Haven't tested though and be wrong – Евгений Крамаров Apr 26 '22 at 21:29
0

If you can represent your data as tuples instead of string, then things get easier:

paths = [('apple', 10, 'banana', 2, 'carrot', 1),
         ('apple', 10, 'banana', 1, 'carrot', 2),
         ('apple', 2, 'banana', 1),
         ('apple', 2, 'banana', 2),
         ('apple', 1, 'banana', 1),
         ('apple', 1, 'banana', 2),
         ('apple', 10, 'banana', 1, 'carrot', 1)
         ]

paths.sort(key=lambda item: (len(item), item))
print(paths)

the output is as you desire I think:

[('apple', 1, 'banana', 1), ('apple', 1, 'banana', 2), ('apple', 2, 'banana', 1), ('apple', 2, 'banana', 2), ('apple', 10, 'banana', 1, 'carrot', 1), ('apple', 10, 'banana', 1, 'carrot', 2), ('apple', 10, 'banana', 2, 'carrot', 1)]
Marcin Cuprjak
  • 674
  • 5
  • 6
  • This answer is great but would be better if it included a way to automatically convert the paths into those tuples. – Stef Apr 27 '22 at 13:54
0

Using the following tools:

  • itertools.groupby with str.isdigit to group characters into contiguous groups of digits or non-digits;
  • ''.join to form words from the groups of characters;
  • a list comprehension to iterate on the groups and filter out the groups of non-digits;
  • int to convert words into ints if they come from a group of digits.

Combining those tools into a tuple key for sorted:

from itertools import groupby

paths = ['apple10/banana2/carrot1', 'apple10/banana1/carrot2', 'apple2/banana1', 'apple2/banana2', 'apple1/banana1', 'apple1/banana2', 'apple10/banana1/carrot1']

sorted(paths,
       key=lambda s: tuple(int(''.join(group))
                           for are_digits,group in groupby(s, key=str.isdigit)
                           if are_digits))
# ['apple1/banana1', 'apple1/banana2', 'apple2/banana1', 'apple2/banana2', 'apple10/banana1/carrot1', 'apple10/banana1/carrot2', 'apple10/banana2/carrot1']
Stef
  • 13,242
  • 2
  • 17
  • 28