-1

I have a list of directories and I would like to sort the list based on the date followed by a number. Here is an example of the unsorted list:

L = ['C:\\Users\\...\\file1\\sample_nov1_1',
    'C:\\Users\\...\\file2\\sample_sep1_1',
    'C:\\Users\\...\\file3\\sample_oct15_2',
    'C:\\Users\\...\\file2\\sample_sep1_2',
     'C:\\Users\\...\\file4\\sample_sep10_2',
    'C:\\Users\\...\\file4\\sample_sep10_1']

I would like to sort it so I get the following output:

['C:\\Users\\...\\sample_sep1_1',
 'C:\\Users\\...\\sample_sep1_2',
 'C:\\Users\\...\\sample_sep10_1',
 'C:\\Users\\...\\sample_sep10_2',
 'C:\\Users\\...\\sample_oct15_1',
 'C:\\Users\\...\\sample_nov1_2']

I get this list by making a walk from a parent directories, but because these files were not created chronologically in the same order that I want the output, I am not sure if I can modify that part of the code. I have already looked at a few other answers such as this one, but they do not have the same complications that I have here. How can I achieve this? I suppose using regular expressions I might be able to simplify a bit, but not sure if that is the correct approach.

Rob
  • 241
  • 1
  • 14

4 Answers4

1

Use the following key method It is working alphabetically

month=['jan','feb','mar','apr','may','jun','jul','aug','sep','oct','nov','dec']
L.sort(key=lambda value:( month.index(value.split('_')[-2][:3]), value.split('_')[-2][3:] , value.split('_')[-1]))
print(L)

Shadowcoder
  • 962
  • 1
  • 6
  • 18
  • This does not sort the months properly. It shows `nov` before `sep`. – Rob Nov 18 '20 at 07:45
  • Thanks for the fix. Now it does not sort them based on dates properly. See example below: ```L = ['C:\\Users\\...\\sample_sep14_1', 'C:\\Users\\...\\sample_sep14_2', 'C:\\Users\\...\\sample_sep10_2', 'C:\\Users\\...\\sample_sep16_1']``` It returns: ```['C:\\Users\\...\\sample_sep14_1', 'C:\\Users\\...\\sample_sep16_1', 'C:\\Users\\...\\sample_sep14_2', 'C:\\Users\\...\\sample_sep10_2']``` But should return: ```[ 'C:\\Users\\...\\sample_sep10_2, 'C:\\Users\\...\\sample_sep14_1', 'C:\\Users\\...\\sample_sep14_2', 'C:\\Users\\...\\sample_sep16_1']``` – Rob Nov 18 '20 at 08:07
  • Your code works fine! Thank you very much! I mark it as accepted answer since it is very concise and simple. – Rob Nov 18 '20 at 08:46
1

This should work:

import re

files = ['C:\\Users\\...\\sample_sep1_1',
 'C:\\Users\\...\\sample_sep1_2',
 'C:\\Users\\...\\sample_sep10_1',
 'C:\\Users\\...\\sample_sep10_2',
 'C:\\Users\\...\\sample_nov1_1',
 'C:\\Users\\...\\sample_oct15_2']

lf = [file.split("\\")[-1].split("_") for file in files]

R = []
for index, x in enumerate(lf):
    dval, num = x[1], int(x[2].split(".")[0])
    grps = re.match("([a-z]+)([0-9]+)", dval).groups()
    R.append((grps[0], int(grps[1]), num, files[index]))

month_map = {'jan': 1, 'feb': 2, 'mar': 3, 'apr': 4, 'may': 5, 'jun': 6, 'jul': 7, 'aug': 8, 'sep': 9, 'oct': 10, 'nov': 11, 'dec': 12}

sorted_files = sorted(R, key=lambda x: (month_map[x[0]], x[1], x[2]))
print(sorted_files)

Output:

[('sep', 1, 1, 'C:\\Users\\...\\sample_sep1_1'), ('sep', 1, 2, 'C:\\Users\\...\\sample_sep1_2'), ('sep', 10, 1, 'C:\\Users\\...\\sample_sep10_1'), ('sep', 10, 2, 'C:\\Users\\...\\sample_sep10_2'), ('oct', 15, 2, 'C:\\Users\\...\\sample_oct15_2'), ('nov', 1, 1, 'C:\\Users\\...\\sample_nov1_1')]

Serial Lazer
  • 1,667
  • 1
  • 7
  • 15
  • Works fine for the example I give but when I try it on the actual list it raises an error: `ValueError: invalid literal for int() with base 10: '1.pdf' ` – Rob Nov 18 '20 at 08:11
  • @Rob Does your files not have names as you mentioned in your question? Where does `.pdf` come into picture? Honestly, none of the answers here would work if your question provides incomplete details – Serial Lazer Nov 18 '20 at 08:15
  • I was not anticipating it would make a huge difference. But yeah all the files are pdfs so they end with `.pdf`. My apologies! @Shadowcodder answer still works but it is not sorting them in order of dates. If that gets solved, it will work. – Rob Nov 18 '20 at 08:20
  • @Rob try now, I made a minor change to work with extensions – Serial Lazer Nov 18 '20 at 08:22
  • I verified locally, works fine with extensions now! – Serial Lazer Nov 18 '20 at 08:23
  • Thank you very much for your help! I realized only two files ended like `sample_nov1_1string.pdf` so I had to remove them manually and then your code worked fine as expected. The other point is that I only wanted a list but your solution returns a tuple so I added `L= [ ] for i in sorted_files: list_new.append(i[3])` to get that. Other than that the code works fine. Thanks! – Rob Nov 18 '20 at 08:40
0

I got this so far. Any other test-edge cases to suggest?


    month_value = {"jan": 1, "feb": 2, "mar": 3, "apr": 4, "may": 5, "jun": 6,
                   "jul": 7, "aug": 8, "sep": 9, "oct": 10, "nov": 11, "dec": 12}
    out = sorted(a, key=lambda x: (month_value[x.split("\\")[-1][7:][:3]], x.split("\\")[-1][7:][3:].split('_')))

Joonyoung Park
  • 474
  • 3
  • 6
-1

Can you try this:

import os
from pathlib import Path

paths = sorted(Path("<file_dir>").iterdir(),key=os.path.getmtime)

print(paths)

Maybe it will help you.

yny
  • 90
  • 8